Paths to Discovery: The Valiant-Vazirani Theorem
This post aims to give a plausible reconstruction of the discovery process behind the Valiant-Vazirani Theorem, a beautiful and important result in complexity theory. If you haven't heard of this before, don't worry, read on. This theorem's discovery path, which I'll render as an internal dialogue (not portraying Valiant or Vazirani themselves), illustrates well, I think, some characteristic patterns of complexity research.
***
So today is the day... I will prove P = NP via a supremely clever algorithm for Boolean Satisfiability! Let me just arrange my pencils and paper, have a snack, and... let's go!
[Tries, fails; gets another snack; putters.]
Alright, this is hopeless. What is a possible source of the difficulty?
Maybe when algorithms are building up partial satisfying assignments, in different stages they stumble upon fragments of different satisfying assignments, and get caught up trying to combine them in a 'Frankenstein' assignment that doesn't work.
For instance, say we have a formula expressing that the 6 variables all take a common value. Then 000*** and ***111 are fragments of satisfying assignments, but they don't combine successfully.
Is there a specific way to overcome this difficulty, to spot-check fragments for mutual compatibility? Well, that sounds as hard as the original problem.
But what if this problem didn't arise? Maybe I could find satisfying assignments under the restrictive assumption that the formulae I'm given have at most one solution. Such an algorithm would still be useful (e.g. for finding prime factorizations)--and, I'll add, impressive... OK, time for round 2!
[Tries, fails]
Damn, useless! Ah well, it's a hard problem. 'A' for effort, now how about some more chips and salsa?
NO! Keep it together. What did your predecessors do when they failed to solve SAT? Did they throw up their hands and walk away? Or did they invent a sophisticated justification for their own failure? The latter: NP-completeness. They're feeling pretty smug about their 'failure' now... can we do something like that?
Well, ideally, we'd show that solving this restricted problem is JUST as hard as solving general SAT. How would we show this? The general reduction method would be--make sure I've got the directions straight...
We want to find a polynomial-time computable function R mapping formulas to formulas, such that:
-If P is satisfiable, R(P) has exactly one satisfying assignment;
-If P is unsatisfiable, so is R(P).
Then, if we had an algorithm A for our restricted SAT problem, and we wanted to determine the satisfiability of a general formula P, we'd just compute A(R(P)) to get the answer.
But how would such a reduction work? Deciding which assignment to preserve seems like a tall order, granted that we don't even know an efficient way to find any assignment. If we fashioned the formula R(P) as a more constrained version of P, the danger is that we'd snuff out all satisfying assignments. On the other hand, what other technique could we possibly use to build R(P)? We can't allow in spurious new satisfying assignments that have nothing to do with those of P.
Let's stick with the idea of making R(P) a more restrictive version of P, but let's abstract from our problem a bit. We need to stop being tempted by the desire for information that would be useful in the reduction, but is itself NP-hard to compute. Instead of a boolean formula with potentially analyzable structure, suppose we are faced with an arbitrary, 'black-box' 0/1-valued function f on a domain of size m = 2^n. We want to modify f in a way that 'kills' all but one of its 1-values. Can we succeed in this setting?
One way to kill 1-values is to create a function g(x) = f(x)*h(x), where h(x) another 0/1-valued function. Then g is indeed a restricted version of f.
If we choose h(x) = 1, g = f and we make no progress. If we choose h(x) = 0, g = 0 and we have erased the information we need. We need some happy medium, but we don't have the resources to analyze f in any meaningful way while producing h.
Thus, the idea of letting h(x) be a random function is almost forced on us! Let's try it. (Note that at this point we are more or less committing to a reduction with some probability of error... oh well.)
Oops, there's a problem: suppose that f has many 1-values. Then the probability that g has at most one 1-value is very low. We could instead let g(x) = f(x)*h[1](x)*...*h[j](x), for a larger number of random functions, but this would almost surely kill all the 1-values of f if there were few to begin with.
Luckily, the range of sensible choices for j isn't too large: if j >> n, then almost surely all 1-values of f are killed off. So let's just plan on guessing the 'right' value of j for the f we are given.
Does this approach do what we want it to? Is there with fair probability some j where g(x) = f(x)*h[1](x)*...*h[j](x) has only a single 1-value? Let's think of multiplying by the h[i]'s sequentially--with each stage, each 1-value in the 'population' is independently 'killed' with probability 1/2. It seems not unlikely that at some stage exactly one remains (call this event 'Isolation'). [Try verifying this.] And since we can guess the right j with at least an inverse-polynomial probability, if we repeat the experiment a poly(n) number of times, we should succeed in Isolation at least once with high probability--assuming f has any 1-values, of course.
There is one hurdle: choosing even a single truly random function h on n bits is computationally expensive, since almost certainly this h takes around 2^n bits even to describe. We can't afford this. Is there some smaller stand-in for this sample space, that is 'random enough' for our purposes? Maybe 'pairwise independence' of the values of h is enough (this is the condition that, for any x != y, the values h(x), h(y) are independent of each other)... [Show that, indeed, this is true: Isolation occurs for some j with probability at least 1/8.]
If so, each random function h[i](x) can be replaced by the much more concisely represented function (a*x - b), where a is a random n-vector and b a random bit. These can be added (in conjunctive form) to the formula representation of f, allowing us to work within the original problem framework.
Let's review the technique: given our formula f, we repeatedly produce functions of form g = f*h[1]...*h[j], where j is random and the h[i] are mutually independent and individually pairwise independent over their domain. If f has a 1-value, then each g has exactly 1 1-value with non-negligible probability, so if we repeat many times, with high probability we achieve Isolation at least once, and our assumed algorithm for finding unique satisfying SAT assignments finds a solution to this g, which is also a satisfying assignment to f. If we don't find such an assignment, chances are f was unsatisfiable to begin with.
***
(I left some of the final steps as exercises both because they're nifty, manageable puzzles, and because they're not distinctively complexity-theoretic.)
What are the methodological ideas illustrated here? Let's see:
-Don't single-mindedly pursue goals (like showing P = NP); look for more modest achievements (like finding unique satisfying assignments), that attempt to remove some important perceived difficulties (here, 'interference' between multiple satisfying assignments).
-Sometimes seemingly modest goals, like this one, are essentially as difficult as the original problem. Keep this in mind, and be explained not just to report success, but also to explain failure.
-Try stripping away problem details that one doesn't know how to exploit effectively (here, by 'forgetting' the logical structure of the input formula).
-When you don't know what to do, try acting randomly. Be prepared to sacrifice in the process: here, our reduction can't prove that producing unique assignments is hard from the assumption P != NP; we must resort to the assumption RP != NP.
-Limited randomness is often almost as effective as full randomness, and much cheaper. k-wise independence (and, in other settings, almost-independence) are one such example; 'min-entropy' is another. On the other hand, it can be conceptually simpler to initially design algorithms with full randomness in mind (as we did), then inspect the analysis to see the extent to which derandomization is possible. In fact, more randomness-efficient versions of the V-V Theorem do exist (I will try to track down the papers).
I'm thinking about a 'Paths to Discovery' sequel on hardness-of-learning results, another area (also bearing the mark of Les Valiant) where the theorems are ultimately simple, but the challenge is determining the right form and target of the reduction. Any feedback, or other suggestions?
***
So today is the day... I will prove P = NP via a supremely clever algorithm for Boolean Satisfiability! Let me just arrange my pencils and paper, have a snack, and... let's go!
[Tries, fails; gets another snack; putters.]
Alright, this is hopeless. What is a possible source of the difficulty?
Maybe when algorithms are building up partial satisfying assignments, in different stages they stumble upon fragments of different satisfying assignments, and get caught up trying to combine them in a 'Frankenstein' assignment that doesn't work.
For instance, say we have a formula expressing that the 6 variables all take a common value. Then 000*** and ***111 are fragments of satisfying assignments, but they don't combine successfully.
Is there a specific way to overcome this difficulty, to spot-check fragments for mutual compatibility? Well, that sounds as hard as the original problem.
But what if this problem didn't arise? Maybe I could find satisfying assignments under the restrictive assumption that the formulae I'm given have at most one solution. Such an algorithm would still be useful (e.g. for finding prime factorizations)--and, I'll add, impressive... OK, time for round 2!
[Tries, fails]
Damn, useless! Ah well, it's a hard problem. 'A' for effort, now how about some more chips and salsa?
NO! Keep it together. What did your predecessors do when they failed to solve SAT? Did they throw up their hands and walk away? Or did they invent a sophisticated justification for their own failure? The latter: NP-completeness. They're feeling pretty smug about their 'failure' now... can we do something like that?
Well, ideally, we'd show that solving this restricted problem is JUST as hard as solving general SAT. How would we show this? The general reduction method would be--make sure I've got the directions straight...
We want to find a polynomial-time computable function R mapping formulas to formulas, such that:
-If P is satisfiable, R(P) has exactly one satisfying assignment;
-If P is unsatisfiable, so is R(P).
Then, if we had an algorithm A for our restricted SAT problem, and we wanted to determine the satisfiability of a general formula P, we'd just compute A(R(P)) to get the answer.
But how would such a reduction work? Deciding which assignment to preserve seems like a tall order, granted that we don't even know an efficient way to find any assignment. If we fashioned the formula R(P) as a more constrained version of P, the danger is that we'd snuff out all satisfying assignments. On the other hand, what other technique could we possibly use to build R(P)? We can't allow in spurious new satisfying assignments that have nothing to do with those of P.
Let's stick with the idea of making R(P) a more restrictive version of P, but let's abstract from our problem a bit. We need to stop being tempted by the desire for information that would be useful in the reduction, but is itself NP-hard to compute. Instead of a boolean formula with potentially analyzable structure, suppose we are faced with an arbitrary, 'black-box' 0/1-valued function f on a domain of size m = 2^n. We want to modify f in a way that 'kills' all but one of its 1-values. Can we succeed in this setting?
One way to kill 1-values is to create a function g(x) = f(x)*h(x), where h(x) another 0/1-valued function. Then g is indeed a restricted version of f.
If we choose h(x) = 1, g = f and we make no progress. If we choose h(x) = 0, g = 0 and we have erased the information we need. We need some happy medium, but we don't have the resources to analyze f in any meaningful way while producing h.
Thus, the idea of letting h(x) be a random function is almost forced on us! Let's try it. (Note that at this point we are more or less committing to a reduction with some probability of error... oh well.)
Oops, there's a problem: suppose that f has many 1-values. Then the probability that g has at most one 1-value is very low. We could instead let g(x) = f(x)*h[1](x)*...*h[j](x), for a larger number of random functions, but this would almost surely kill all the 1-values of f if there were few to begin with.
Luckily, the range of sensible choices for j isn't too large: if j >> n, then almost surely all 1-values of f are killed off. So let's just plan on guessing the 'right' value of j for the f we are given.
Does this approach do what we want it to? Is there with fair probability some j where g(x) = f(x)*h[1](x)*...*h[j](x) has only a single 1-value? Let's think of multiplying by the h[i]'s sequentially--with each stage, each 1-value in the 'population' is independently 'killed' with probability 1/2. It seems not unlikely that at some stage exactly one remains (call this event 'Isolation'). [Try verifying this.] And since we can guess the right j with at least an inverse-polynomial probability, if we repeat the experiment a poly(n) number of times, we should succeed in Isolation at least once with high probability--assuming f has any 1-values, of course.
There is one hurdle: choosing even a single truly random function h on n bits is computationally expensive, since almost certainly this h takes around 2^n bits even to describe. We can't afford this. Is there some smaller stand-in for this sample space, that is 'random enough' for our purposes? Maybe 'pairwise independence' of the values of h is enough (this is the condition that, for any x != y, the values h(x), h(y) are independent of each other)... [Show that, indeed, this is true: Isolation occurs for some j with probability at least 1/8.]
If so, each random function h[i](x) can be replaced by the much more concisely represented function (a*x - b), where a is a random n-vector and b a random bit. These can be added (in conjunctive form) to the formula representation of f, allowing us to work within the original problem framework.
Let's review the technique: given our formula f, we repeatedly produce functions of form g = f*h[1]...*h[j], where j is random and the h[i] are mutually independent and individually pairwise independent over their domain. If f has a 1-value, then each g has exactly 1 1-value with non-negligible probability, so if we repeat many times, with high probability we achieve Isolation at least once, and our assumed algorithm for finding unique satisfying SAT assignments finds a solution to this g, which is also a satisfying assignment to f. If we don't find such an assignment, chances are f was unsatisfiable to begin with.
***
(I left some of the final steps as exercises both because they're nifty, manageable puzzles, and because they're not distinctively complexity-theoretic.)
What are the methodological ideas illustrated here? Let's see:
-Don't single-mindedly pursue goals (like showing P = NP); look for more modest achievements (like finding unique satisfying assignments), that attempt to remove some important perceived difficulties (here, 'interference' between multiple satisfying assignments).
-Sometimes seemingly modest goals, like this one, are essentially as difficult as the original problem. Keep this in mind, and be explained not just to report success, but also to explain failure.
-Try stripping away problem details that one doesn't know how to exploit effectively (here, by 'forgetting' the logical structure of the input formula).
-When you don't know what to do, try acting randomly. Be prepared to sacrifice in the process: here, our reduction can't prove that producing unique assignments is hard from the assumption P != NP; we must resort to the assumption RP != NP.
-Limited randomness is often almost as effective as full randomness, and much cheaper. k-wise independence (and, in other settings, almost-independence) are one such example; 'min-entropy' is another. On the other hand, it can be conceptually simpler to initially design algorithms with full randomness in mind (as we did), then inspect the analysis to see the extent to which derandomization is possible. In fact, more randomness-efficient versions of the V-V Theorem do exist (I will try to track down the papers).
I'm thinking about a 'Paths to Discovery' sequel on hardness-of-learning results, another area (also bearing the mark of Les Valiant) where the theorems are ultimately simple, but the challenge is determining the right form and target of the reduction. Any feedback, or other suggestions?
Labels: complexity
3 Comments:
Thanks, this was very interesting. I agree with most of the lessons you draw from this. But I do think it's possible to imagine more adventurous discovery paths. By referring to different aspects of the conceptual superstructure of our field, they reveal its philosophical richness.
After reading your story, I went and took a look at the abstract of the Valiant-Vazirani paper, and a phrase there suggested one such alternative path. Suppose we are concerned not with solving satisfiability, but with cryptography. The security of many cryptographic protocols is based on the existence of one-way functions - functions that are easy to compute but hard to invert on average. Most of the candidate one-way functions considered in the early 80s were in fact one-way permutations, with rich algebraic structure. This simplified the construction of protocols, but of course based on what might be a stronger hardness assumption. So it would be very interesting to directly construct one-way permutations from one-way functions - in some cases, this would gives us cryptographic functionality from a weaker assumption, and in other cases it would simplify the construction of protocols.
Valiant-Vazirani can, in retrospect, be seen as a failed attempt to construct pseudo-random permutations from pseudo-random functions. The range of a one-way function (parameterized appropriately) is in NP, and the range of a one-way permutation is in Unique NP, so Valiant-Vazirani is an analogue of what we really need. Kahn, Saks and Smyth (Complexity '00) later showed, by proving a conjecture of Rudich, that a "black-box" argument of the sort used by Valiant-Vazirani can't be used for the cryptographic analogue. Well, failures can be just as interesting as successes in our field...
Another path is revealed by looking at the constructive aspect of a reduction rather than its interpretation as proving hardness. Suppose we focus not on solving SAT but on recovering a solution given an algorithm for SAT. It is easy to do this sequentially using self-reducibility, but the sequentiality seems fundamental. To me, the non-triviality of Valiant-Vazirani is in the way it contradicts this intuition. It's easy to find a solution in parallel for Unique-SAT, and by composing the reduction from SAT to Unique-SAT with the solution-finding procedure, we can do the same for SAT. The promising thing about this is that it points the way to finding parallel solutions for problems that we actually believe we can do in parallel, as Mulmuley, Vazirani and Vazirani so elegantly demonstrated a little later.
From a historical point of view, it's suggestive that Vijay Vazirani's main areas of interest early on were cryptography and matching algorithms. These seem utterly unrelated, and yet...
I'd love to see a sequel. I have two suggestions, one of which I am sure you considered yourself.
The first is the Vazirani-Vazirani paper "A natural encoding scheme proved probabilistic-polynomial complete". This is one of those papers that has completely faded from consciousness (Umesh doesn't list it on his webpage), but I thought it was interesting the one time I looked at it, and as far as I know, it was the first natural example of a probabilistic reduction.
The second is the Goldreich-Goldwasser-Micali paper on pseudo-random functions. It showed the way to using cryptography in constructing reductions for natural problems (as exploited later by Razborov-Rudich), but it's also written with such a cogent philosophical motivation. I like the way it emphasizes how "random-looking" functions can be constructed from functions like Factoring that are not random-looking at all. Also, it uses the hardness-of-learning link to draw the lesson that physicists cannot be automated, just as NP \neq P can be used to draw the lesson that mathematicians cannot be automated :)
By Anonymous, at 7:02 PM
Thanks, Rahul! I'd be interested to read that other V-V paper, if you ever find it online.
I agree that there are other, and more adventurous, paths to Valiant-Vazirani, namely, any of its several applications. Certainly parallel nonadaptive witness constructions is one of its most impressive and surprising, perhaps second only to the role of Isolation in Toda's Theorem.
Thanks also for pointing out the possibility that V-V could've come out of a desire to weaken crypto hardness assumptions. In fact, I'm pretty sure I've tried clumsily to use Isolation for just that purpose (show OWF ==> OWP), not to mention for showing NP-hardness of UP and other nonstarters.
Cryptography is certainly an inspiring source of cool reductions, with hardness-randomness tradeoffs high on the list. I still have yet to read many of the key '80s crypto-theory papers, and if anyone had a top-5 or 10 list I'd greatly appreciate it.
By Andy D, at 1:51 AM
Hey! Coach Factory Online simply wanted to say your website is one of the nicely laid out, most inspirational I have come across in quite a while.
By Coach Factory Online, at 2:50 AM
Post a Comment
<< Home