Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wrote a CSP solver a while ago. The bugs were alarmingly prone to being incomplete enumeration of solutions. The solver would say "Yep, here's all 2412 solutions to that problem", but there were more solutions out there it missed. I didn't usually know how many solutions there would be so it's hard to distinguish between "all of them" and "95% of them". However that also threw a lot of doubt on when the solver returned unsat - was the problem unsat, or was the solver failing to look in part of the search space?

I didn't find a plausible testing strategy for convincing myself that the last of that class of error was gone and ultimately stopped working on it as a result.



The best practical technique is generate random problems, and compare your solver with another (or two), and hope you didn't both have the same bugs :)

You can also often make progress by "shuffling" problems (reordering constraints / variables), or solving sub-problems + super-problems.


This is my go-to technique for tricky code.

Most of the time it's pretty easy to write a dumb, slow, almost certainly correct version of the thing in question and just fuzz the two implementations on small inputs. E.g., if you're speeding up floating-point division by 2 with bit-hacks in the exponent field, compare it to just dividing by 2 (and for up to 32-bit floats, just check all of them). If you're implementing a sort, compare it to bubble-sort. If you're implementing a SAT solver, compare that to just checking every possible solution (and even for a comparison solver that slow you can check every possible problem up to ~4-6 terms (depending a bit on how you define a "term") and spot-check problems with many terms and 30 independent variables).


Yes, I've done exactly this with my constraint solver ( https://github.com/minion/minion ). Famous last words, almost all bugs that have been reported have been repeatable with 6 variables of domain size at most 6 (maybe even less).

The only exceptions are cases where integer overflow occurs, and those really need seperate tests. I'm mostly handling that by putting fairly conservative bounds on maximum integer values, and asserting if those are ever violated, because (particularly as academic software), an assertion where your solve might work but is undertested is (in my mind) infinitely better than a wrong answer.


What sorts of code normally causes integer overflows in minion? I have to be super cautious of float overflows in most of my code, but usually with integers I'm implementing a bitmask or a 64-bit counter or something similarly not prone to such problems.


Just boring old arithmetic, it's quite common for people to write maths problems which end up overflowing a 64-bit integer.

Here's one I remember someone writing:

There is a 'numbers game' on the UK show countdown, where you are given a series of 6 numbers, and need to add, multiply, subtract and divide them to make a target. They were considering generalisations.

They wanted to allow '500' as an input number, and also allow 8 numbers. While their targets were numbers less than 10,000, 500^8 overflows a 64-bit integer, and we often want to find things like upper and lower bounds early on.


I'd be interested to know if you might be able to if you could find problems by looking at invariants over partitions of the space.

Like

Is the number of solutions with v1=false plus the number with v1=true the number with v1 free?

If we have n constraints then there 2*n variants where you negate subsets of constraints. Is the sum of solution counts over the 2*n subproblems equal the size of the variable space?


That's a good idea.

The search method is propagate constraints for a while then pick a variable and bisect it, solve each side and combine.

Generalising slightly from your suggestion, varying the order in which the search runs, and whether various propagations or optimisations are active, should only change runtime and not the set of results.

The other thing I've discovered since that project is clang's AST based branch coverage recording. Getting to the point where smallish unit tests that can be perturbed as above or enumerated exercise each path through the solver would probably shake out the bugs.


Maybe one could attempt to use a proof assistant to prove that the algorithm is correct?


For what it's worth I'm now working on a proof assistant so yeah, that's the game plan. I think the straight line case for the solver is probably OK but there's a huge range of optimisations out there and it's far too easy to introduce errors there.


At present we are a long way from proving general constraint solvers are correct, as they are built from some many little algorithms that interact in weird ways.

It's an interesting area, and I hope we'll get there in the future.


Or just prove yourself with pen and paper if the algorithm works, using structural induction and other techniques. Then it's just a matter of whether the code follows the spec




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: