Paths to Discovery: The Valiant-Vazirani Theorem
This post aims to give a plausible reconstruction of the discovery process behind the Valiant-Vazirani Theorem, a beautiful and important result in complexity theory. If you haven't heard of this before, don't worry, read on. This theorem's discovery path, which I'll render as an internal dialogue (not portraying Valiant or Vazirani themselves), illustrates well, I think, some characteristic patterns of complexity research.
***
So today is the day... I will prove P = NP via a supremely clever algorithm for Boolean Satisfiability! Let me just arrange my pencils and paper, have a snack, and... let's go!
[Tries, fails; gets another snack; putters.]
Alright, this is hopeless. What is a possible source of the difficulty?
Maybe when algorithms are building up partial satisfying assignments, in different stages they stumble upon fragments of different satisfying assignments, and get caught up trying to combine them in a 'Frankenstein' assignment that doesn't work.
For instance, say we have a formula expressing that the 6 variables all take a common value. Then 000*** and ***111 are fragments of satisfying assignments, but they don't combine successfully.
Is there a specific way to overcome this difficulty, to spot-check fragments for mutual compatibility? Well, that sounds as hard as the original problem.
But what if this problem didn't arise? Maybe I could find satisfying assignments under the restrictive assumption that the formulae I'm given have at most one solution. Such an algorithm would still be useful (e.g. for finding prime factorizations)--and, I'll add, impressive... OK, time for round 2!
[Tries, fails]
Damn, useless! Ah well, it's a hard problem. 'A' for effort, now how about some more chips and salsa?
NO! Keep it together. What did your predecessors do when they failed to solve SAT? Did they throw up their hands and walk away? Or did they invent a sophisticated justification for their own failure? The latter: NP-completeness. They're feeling pretty smug about their 'failure' now... can we do something like that?
Well, ideally, we'd show that solving this restricted problem is JUST as hard as solving general SAT. How would we show this? The general reduction method would be--make sure I've got the directions straight...
We want to find a polynomial-time computable function R mapping formulas to formulas, such that:
-If P is satisfiable, R(P) has exactly one satisfying assignment;
-If P is unsatisfiable, so is R(P).
Then, if we had an algorithm A for our restricted SAT problem, and we wanted to determine the satisfiability of a general formula P, we'd just compute A(R(P)) to get the answer.
But how would such a reduction work? Deciding which assignment to preserve seems like a tall order, granted that we don't even know an efficient way to find any assignment. If we fashioned the formula R(P) as a more constrained version of P, the danger is that we'd snuff out all satisfying assignments. On the other hand, what other technique could we possibly use to build R(P)? We can't allow in spurious new satisfying assignments that have nothing to do with those of P.
Let's stick with the idea of making R(P) a more restrictive version of P, but let's abstract from our problem a bit. We need to stop being tempted by the desire for information that would be useful in the reduction, but is itself NP-hard to compute. Instead of a boolean formula with potentially analyzable structure, suppose we are faced with an arbitrary, 'black-box' 0/1-valued function f on a domain of size m = 2^n. We want to modify f in a way that 'kills' all but one of its 1-values. Can we succeed in this setting?
One way to kill 1-values is to create a function g(x) = f(x)*h(x), where h(x) another 0/1-valued function. Then g is indeed a restricted version of f.
If we choose h(x) = 1, g = f and we make no progress. If we choose h(x) = 0, g = 0 and we have erased the information we need. We need some happy medium, but we don't have the resources to analyze f in any meaningful way while producing h.
Thus, the idea of letting h(x) be a random function is almost forced on us! Let's try it. (Note that at this point we are more or less committing to a reduction with some probability of error... oh well.)
Oops, there's a problem: suppose that f has many 1-values. Then the probability that g has at most one 1-value is very low. We could instead let g(x) = f(x)*h[1](x)*...*h[j](x), for a larger number of random functions, but this would almost surely kill all the 1-values of f if there were few to begin with.
Luckily, the range of sensible choices for j isn't too large: if j >> n, then almost surely all 1-values of f are killed off. So let's just plan on guessing the 'right' value of j for the f we are given.
Does this approach do what we want it to? Is there with fair probability some j where g(x) = f(x)*h[1](x)*...*h[j](x) has only a single 1-value? Let's think of multiplying by the h[i]'s sequentially--with each stage, each 1-value in the 'population' is independently 'killed' with probability 1/2. It seems not unlikely that at some stage exactly one remains (call this event 'Isolation'). [Try verifying this.] And since we can guess the right j with at least an inverse-polynomial probability, if we repeat the experiment a poly(n) number of times, we should succeed in Isolation at least once with high probability--assuming f has any 1-values, of course.
There is one hurdle: choosing even a single truly random function h on n bits is computationally expensive, since almost certainly this h takes around 2^n bits even to describe. We can't afford this. Is there some smaller stand-in for this sample space, that is 'random enough' for our purposes? Maybe 'pairwise independence' of the values of h is enough (this is the condition that, for any x != y, the values h(x), h(y) are independent of each other)... [Show that, indeed, this is true: Isolation occurs for some j with probability at least 1/8.]
If so, each random function h[i](x) can be replaced by the much more concisely represented function (a*x - b), where a is a random n-vector and b a random bit. These can be added (in conjunctive form) to the formula representation of f, allowing us to work within the original problem framework.
Let's review the technique: given our formula f, we repeatedly produce functions of form g = f*h[1]...*h[j], where j is random and the h[i] are mutually independent and individually pairwise independent over their domain. If f has a 1-value, then each g has exactly 1 1-value with non-negligible probability, so if we repeat many times, with high probability we achieve Isolation at least once, and our assumed algorithm for finding unique satisfying SAT assignments finds a solution to this g, which is also a satisfying assignment to f. If we don't find such an assignment, chances are f was unsatisfiable to begin with.
***
(I left some of the final steps as exercises both because they're nifty, manageable puzzles, and because they're not distinctively complexity-theoretic.)
What are the methodological ideas illustrated here? Let's see:
-Don't single-mindedly pursue goals (like showing P = NP); look for more modest achievements (like finding unique satisfying assignments), that attempt to remove some important perceived difficulties (here, 'interference' between multiple satisfying assignments).
-Sometimes seemingly modest goals, like this one, are essentially as difficult as the original problem. Keep this in mind, and be explained not just to report success, but also to explain failure.
-Try stripping away problem details that one doesn't know how to exploit effectively (here, by 'forgetting' the logical structure of the input formula).
-When you don't know what to do, try acting randomly. Be prepared to sacrifice in the process: here, our reduction can't prove that producing unique assignments is hard from the assumption P != NP; we must resort to the assumption RP != NP.
-Limited randomness is often almost as effective as full randomness, and much cheaper. k-wise independence (and, in other settings, almost-independence) are one such example; 'min-entropy' is another. On the other hand, it can be conceptually simpler to initially design algorithms with full randomness in mind (as we did), then inspect the analysis to see the extent to which derandomization is possible. In fact, more randomness-efficient versions of the V-V Theorem do exist (I will try to track down the papers).
I'm thinking about a 'Paths to Discovery' sequel on hardness-of-learning results, another area (also bearing the mark of Les Valiant) where the theorems are ultimately simple, but the challenge is determining the right form and target of the reduction. Any feedback, or other suggestions?
***
So today is the day... I will prove P = NP via a supremely clever algorithm for Boolean Satisfiability! Let me just arrange my pencils and paper, have a snack, and... let's go!
[Tries, fails; gets another snack; putters.]
Alright, this is hopeless. What is a possible source of the difficulty?
Maybe when algorithms are building up partial satisfying assignments, in different stages they stumble upon fragments of different satisfying assignments, and get caught up trying to combine them in a 'Frankenstein' assignment that doesn't work.
For instance, say we have a formula expressing that the 6 variables all take a common value. Then 000*** and ***111 are fragments of satisfying assignments, but they don't combine successfully.
Is there a specific way to overcome this difficulty, to spot-check fragments for mutual compatibility? Well, that sounds as hard as the original problem.
But what if this problem didn't arise? Maybe I could find satisfying assignments under the restrictive assumption that the formulae I'm given have at most one solution. Such an algorithm would still be useful (e.g. for finding prime factorizations)--and, I'll add, impressive... OK, time for round 2!
[Tries, fails]
Damn, useless! Ah well, it's a hard problem. 'A' for effort, now how about some more chips and salsa?
NO! Keep it together. What did your predecessors do when they failed to solve SAT? Did they throw up their hands and walk away? Or did they invent a sophisticated justification for their own failure? The latter: NP-completeness. They're feeling pretty smug about their 'failure' now... can we do something like that?
Well, ideally, we'd show that solving this restricted problem is JUST as hard as solving general SAT. How would we show this? The general reduction method would be--make sure I've got the directions straight...
We want to find a polynomial-time computable function R mapping formulas to formulas, such that:
-If P is satisfiable, R(P) has exactly one satisfying assignment;
-If P is unsatisfiable, so is R(P).
Then, if we had an algorithm A for our restricted SAT problem, and we wanted to determine the satisfiability of a general formula P, we'd just compute A(R(P)) to get the answer.
But how would such a reduction work? Deciding which assignment to preserve seems like a tall order, granted that we don't even know an efficient way to find any assignment. If we fashioned the formula R(P) as a more constrained version of P, the danger is that we'd snuff out all satisfying assignments. On the other hand, what other technique could we possibly use to build R(P)? We can't allow in spurious new satisfying assignments that have nothing to do with those of P.
Let's stick with the idea of making R(P) a more restrictive version of P, but let's abstract from our problem a bit. We need to stop being tempted by the desire for information that would be useful in the reduction, but is itself NP-hard to compute. Instead of a boolean formula with potentially analyzable structure, suppose we are faced with an arbitrary, 'black-box' 0/1-valued function f on a domain of size m = 2^n. We want to modify f in a way that 'kills' all but one of its 1-values. Can we succeed in this setting?
One way to kill 1-values is to create a function g(x) = f(x)*h(x), where h(x) another 0/1-valued function. Then g is indeed a restricted version of f.
If we choose h(x) = 1, g = f and we make no progress. If we choose h(x) = 0, g = 0 and we have erased the information we need. We need some happy medium, but we don't have the resources to analyze f in any meaningful way while producing h.
Thus, the idea of letting h(x) be a random function is almost forced on us! Let's try it. (Note that at this point we are more or less committing to a reduction with some probability of error... oh well.)
Oops, there's a problem: suppose that f has many 1-values. Then the probability that g has at most one 1-value is very low. We could instead let g(x) = f(x)*h[1](x)*...*h[j](x), for a larger number of random functions, but this would almost surely kill all the 1-values of f if there were few to begin with.
Luckily, the range of sensible choices for j isn't too large: if j >> n, then almost surely all 1-values of f are killed off. So let's just plan on guessing the 'right' value of j for the f we are given.
Does this approach do what we want it to? Is there with fair probability some j where g(x) = f(x)*h[1](x)*...*h[j](x) has only a single 1-value? Let's think of multiplying by the h[i]'s sequentially--with each stage, each 1-value in the 'population' is independently 'killed' with probability 1/2. It seems not unlikely that at some stage exactly one remains (call this event 'Isolation'). [Try verifying this.] And since we can guess the right j with at least an inverse-polynomial probability, if we repeat the experiment a poly(n) number of times, we should succeed in Isolation at least once with high probability--assuming f has any 1-values, of course.
There is one hurdle: choosing even a single truly random function h on n bits is computationally expensive, since almost certainly this h takes around 2^n bits even to describe. We can't afford this. Is there some smaller stand-in for this sample space, that is 'random enough' for our purposes? Maybe 'pairwise independence' of the values of h is enough (this is the condition that, for any x != y, the values h(x), h(y) are independent of each other)... [Show that, indeed, this is true: Isolation occurs for some j with probability at least 1/8.]
If so, each random function h[i](x) can be replaced by the much more concisely represented function (a*x - b), where a is a random n-vector and b a random bit. These can be added (in conjunctive form) to the formula representation of f, allowing us to work within the original problem framework.
Let's review the technique: given our formula f, we repeatedly produce functions of form g = f*h[1]...*h[j], where j is random and the h[i] are mutually independent and individually pairwise independent over their domain. If f has a 1-value, then each g has exactly 1 1-value with non-negligible probability, so if we repeat many times, with high probability we achieve Isolation at least once, and our assumed algorithm for finding unique satisfying SAT assignments finds a solution to this g, which is also a satisfying assignment to f. If we don't find such an assignment, chances are f was unsatisfiable to begin with.
***
(I left some of the final steps as exercises both because they're nifty, manageable puzzles, and because they're not distinctively complexity-theoretic.)
What are the methodological ideas illustrated here? Let's see:
-Don't single-mindedly pursue goals (like showing P = NP); look for more modest achievements (like finding unique satisfying assignments), that attempt to remove some important perceived difficulties (here, 'interference' between multiple satisfying assignments).
-Sometimes seemingly modest goals, like this one, are essentially as difficult as the original problem. Keep this in mind, and be explained not just to report success, but also to explain failure.
-Try stripping away problem details that one doesn't know how to exploit effectively (here, by 'forgetting' the logical structure of the input formula).
-When you don't know what to do, try acting randomly. Be prepared to sacrifice in the process: here, our reduction can't prove that producing unique assignments is hard from the assumption P != NP; we must resort to the assumption RP != NP.
-Limited randomness is often almost as effective as full randomness, and much cheaper. k-wise independence (and, in other settings, almost-independence) are one such example; 'min-entropy' is another. On the other hand, it can be conceptually simpler to initially design algorithms with full randomness in mind (as we did), then inspect the analysis to see the extent to which derandomization is possible. In fact, more randomness-efficient versions of the V-V Theorem do exist (I will try to track down the papers).
I'm thinking about a 'Paths to Discovery' sequel on hardness-of-learning results, another area (also bearing the mark of Les Valiant) where the theorems are ultimately simple, but the challenge is determining the right form and target of the reduction. Any feedback, or other suggestions?
Labels: complexity