For the final project of Efficient Algorithms and Intractable Problems (CS 170), we were asked to solve the not-betweeness problem: given elements ai, where i is an integer from 1 to n, and a set of constraints c which state a certain element is not between two other elements, find an ordering of the elements which solves as many constraints as possible.
I later found out that this was a variant of betweenness. In this problem, constraints state that an element is between two other elements. Under a different name (Total Ordering Problem), Opatrný proved this was NP-complete.
As a result of an early mistake, I ended up creating several solutions (because I believed all of them were suboptimal until I caught my mistake). In retrospect, I really enjoyed the experience because I had the opportunity to explore a lot of different algorithms through the process. The team that I led for this project also finished in the top 8% of groups, with full marks on the project.
At first, I used a genetic algorithm to find the optimal solution. Since my solution was permutation based, I used the swap mutation and the OX1 crossover operator. To select parents, I used tournament selection. Because my fitness function was literally the opposite of what it should've been, this algorithm did not perform very well.
My second solution attempt used a reduction to SAT, and more specifically, 3SAT. To do this, I made boolean variables which represented ai < aj. More specifically, if ai comes before aj, then a variable xk would be true, otherwise it's false. Let's assume that the variables which indicate ai < aj , ai < ak are x1 , x2, respectively. Given that ai is not between aj and ak, we could also say that (ai < aj ∧ ai < aj) ∨ (ai > aj ∧ ai > aj). In terms of x, we have (x1 ∧ x2) ∨ (-x1 ∧ -x2)
Because SAT is a well known solver, I assumed a Python SAT solver existed. Thankfully, it did in the form of pycosat. However, it takes clauses in conjunctive normal form (CNF). To convert my clauses into CNF, I recognized that the boolean expression I built was effectively xnor. xor can be represented as (x1 ∧ -x2) ∨ (-x1 ∧ x2). If we negate this expession and apply De-Morgan's law through the expression, we get (-x1 ∨ x2) ∧ (x1 ∨ -x2), which is in CNF.
After getting a set of variables back from the SAT solver, I had to reconstruct this into an ordering. Because each variable represents an ordering between two elements, we can effectively draw a directed acyclic graph (DAG) with this information. More specifically, let us use each ai as a node in a graph. If we know ai < aj, then we can draw an edge from ai to aj. If we topologically sort this DAG, then we can get a proper ordering of the elements. I used the toposort library in Python to perform this.
It turns out that the DAG that's created can have circular dependencies. More specifically, there are instances where the solution pycosat returns has a loop which looks like ai < aj < ak < ai. Obviously, this is undesirable, and while I could work around this by iterating through the SAT solutions until there were no circular dependencies, problems of larger size would not be solvable within a reasonable time through this method.
It was (late) at this stage that I discovered my fitness function was incorrect. In simulated annearling, a huge determining factor in getting the correct solution in a reasonable amount of time is the neighbor-generating function. If a bad function is used, then it's likely that a solution will be stuck in local optima, or just not be able to traverse to the correct solution at all. Because the solutions to not-betweeness are permutation-based, I used neigbor selection algorithms similar to those used for the travelling salesman problem, another permutation-based question. It turned out that random swap (the mutation function from the genetic algorithm) was effective. However, the most effective neighbor-generating function was actually a random slide, in which one element is shifted to another location while all other elements are kept in order.