Using Randomness to Derandomize
The 'probabilistic method' is everywhere in combinatorics. There are plenty of good expositions, with Alon and Spencer's among the juiciest. I aim to describe an exquisite application due to Razborov in 1988 that seems little-known (perhaps because the paper's in Russian). I read about it in Jukna's 'Extremal Combinatorics', which I strongly recommend.
The Result
This application revisits one of the very first sites of the probabilistic method: construction of Ramsey graphs. A graph is Ramsey if (speaking qualitatively, here as throughout) it is very large, yet contains no large cliques or independent sets. Erdos argued: pick a random graph by fixing the nodes and then including each edge independently with probability 1/2. Each 'large' subgraph H has only a 'very small' probability q_H of forming a clique or independent set, so by taking the union bound over all of these events for each small subgraph, the probability of any large cliques or independent sets appearing is at most the sum over all H of q_H. So, Ramsey graphs exist if we set the parameters so that this sum is less than 1. In fact the graph size can be set to 2^n while all clique/independent set sizes are held to poly(n).
Razborov goes one up on Erdos with a significantly more explicit Ramsey construction. He shows how the representation can be made exponentially concise.
'Hold on,' you might say, 'I can do that: "The lexicographically first Ramsey graph on 2^n vertices" is even a doubly-exponentially concise description.' Sure; but the obvious shortcoming of this description is it takes incredible time to unpack it if I were to ask you the simplest thing about your graph--say, Is node 2^n connected to node 2^(n-1)? Razborov's representation, by contrast, is a poly(n)-sized, constant-depth circuit that outputs the answers to precisely these edge-queries, on demand. The only thing your description has on his is that, though he shows his to exist, he doesn't exactly know how to find it...
Remarks
Why do I like this result so much? First, its proof uses graph expansion and the isolation lemma, two important tools I've enjoyed studying. It improves on the probabilistic method using.. the probabilistic method, something that took me by surprise.
The theorem also shows that certain mathematical objects are much more accessible than they might appear. To drive this home, let me suggest a small strengthening of the result: as should become apparent, we can get not just the guarantee that a small-circuit representation does exist, but also away to produce one with high probability. So, if you want a deterministic and simple instruction-kit, small enough to fit in your pocket, for building a Ramsey graph on 2^100 vertices edge by edge, rest assured that not only does there exist such a kit but that you can (probabilistically) build it yourself.
For those familiar with the polynomial hierarchy (PH) in complexity theory: this result is related to (though not implied by) the result of Sipser that BPP is contained in the second level of the PH, in the following way. Let L be a decision problem. Say we've designed a P-time algorithm for L that is almost deterministic, but relies on having oracle access to the incidence matrix of an exponentially large Ramsey graph (doesn't matter which one we're supplied). We could simulate the algorithm in BPP by randomly generating the graph as needed (a la Erdos); but, a la Razborov, we could also simulate the algorithm via a 2-round game: player 1 proposes a small circuit which he claims computes a Ramsey incidence matrix; player 2 is given the opportunity to refute the circuit by exhibiting a poly(n)-sized clique or independent set. If the refutation fails, the players run the algorithm with the circuit for an oracle. With slight modification, this shows that L is in both Sigma_2 and Pi_2, admittedly in a more complicated way then via Sipser's theorem.
Of course, the same result holds if L needs any other type of object amenable to Razborov's method, although I should point out that some important structures guaranteed to exist by the probabilistic method are not so amenable, e.g. expander graphs, whose defining contraints involve much larger vertex sets than the Ramsey constraints. (Of course, we now have good small descriptions of expanders by other, more direct means.)
Finally, this result prefigures one of Razborov's most famous results, one that showed him to be not just a technical virtuoso but a true complexity visionary: the Natural Proofs work with Steve Rudich. The theorem under discussion implies, in the natural-proofs framework, that properties like 'G is Ramsey' (which meets the 'largeness' requirement and, to an extent depending on the strength of the Ramsey property, the 'constructivity' one as well) cannot be used in a natural proof against polynomial-sized circuits (or even the constant-depth ones used in the Ramsey construction). This is what Razborov-Rudich would lead us to expect: if the property could be used in such a natural proof, it would show strong limits on the possibility of one-way functions, pseudorandom generators, and mathematically secure cryptography.
Conclusion: two thumbs up for Razborov's result.
A Proof Sketch
First, we should be clear about one thing: like Erdos' proof, Razborov's is not really about graphs; it only uses them as a test case. The proof method is properly viewed from the standpoint of boolean logic. So, the 'graph' is now an incidence matrix; the property 'G is Ramsey' becomes a conjunction of properties t_H for each large 'subgraph' H, stating that H isn't a clique or independent set; the goal is to produce an assignment to the variables that satisfies all the t_H. Erdos' proof solves this problem with the sole observation that each t_H is very likely to be satisfied by a random assignment. Razborov solves the problem in the stronger sense described after making one (and only one) additional observation about the clauses t_H: each one quantifies over only a few variables--poly(n) of them, the edges in the subgraph H.
Then, viewed in full generality, our task is to produce a circuit computing a certain kind of function f on m variables; by Razborov's observation, the requirements on the function f imposed by the construction task are a collection of properties t[1](f), ... t[k](f) which must be satisfied, each of which depends on only poly(m) bits of f (and is highly likely to be satisfied for random assignments to its f-bits).
Razborov sets up a randomized 'growth process' that builds up a circuit piece by piece. At each stage i we have a circuit C[i](x) computing a function F[i](x) on m variables x.
How does the growth process work? The bottom gate of the circuit is an XOR, and at every stage we just slap on a new argument to it. Each argument is an AND of an appropriate number of independently chosen random linear equations (XORs or negations of XORs) over the variables of x. So, this is a depth-3 circuit if you allow XOR in your basis and unbounded fanin.
The process doesn't converge quickly to a uniform distribution on functions (since most functions don't have small circuits), *but* when we restrict our attention to any fixed poly(m) bits of the output, we do get such convergence. Thus, we can terminate the growth process after poly(m) steps and the probability distribution on the terminal function f(x) = F_(poly(m))(x) is sufficiently 'locally uniform' that each property t[j](f) is satisfied with very high probability. Then we apply the union bound and conclude that with nonzero probability over all runs, f(x) satisfies all of them. So the only remaining task is to explain why this fast 'local' convergence to uniform occurs.
So fix any set K of poly(m) input vectors to the circuit (such as the ones an individual property t[j](f) depends on). Let's focus on what a particular AND-of-XORS does on inputs from K. The Isolation Lemma of Valiant & Vazirani (see Papadimitriou's textbook) kicks in to give us everything we need: if we choose the number of equations right (and we only need O(log(m)) of them), there is a decent chance (at least 1/8) that exactly one of the vectors in K will satisfy each equation, and in this case each vector in K is equally likely to be that one left standing.
Thus, for each vector v in K, each AND we construct has a 1/poly(m) chance of behaving, on input from K, like the function I_v(x) which is 1 iff x = v. These I_v, in turn, collectively form a *basis* for the vector space V of all boolean functions on K, with XOR of functions as the addition operation.
Think about this vector space V for functions on K as a hypercube of poly(m) dimensions, one for each I_v. At each stage of the growth process, we've got a circuit which induces a function on K, that is, a point in V. And in each phase, adding an argument to the bottom XOR, we make a jump to another point in this space, a jump which is independent of where we are in V so far. The question is, does this walk converge quickly (in poly(m) steps) to a nearly-uniform distribution?
Answer: Yes. It's a somewhat delicate exercise in probability, but the intuition is: when Isolation succeeds, the distribution is pushed closer to uniform to the same extent it would be in a simple random walk in V (since the resulting step is uniform over the 'edges' indexed by the I_v's, by the symmetry of Isolation). Isolation succeeds 1/8th of the time, so using the good expansion properties of the hypercube (see a previous post) and the expansion-convergence connection, we're golden, *as long as* the distribution is not made less uniform by the steps in which isolation fails. But even in these cases, we're XORing with a new function that is independent of what has come before, so the resulting effect cannot carry us further away from uniform (XOR is a 'smoothing' operation from this perspective).
This concludes the proof outline.
The Result
This application revisits one of the very first sites of the probabilistic method: construction of Ramsey graphs. A graph is Ramsey if (speaking qualitatively, here as throughout) it is very large, yet contains no large cliques or independent sets. Erdos argued: pick a random graph by fixing the nodes and then including each edge independently with probability 1/2. Each 'large' subgraph H has only a 'very small' probability q_H of forming a clique or independent set, so by taking the union bound over all of these events for each small subgraph, the probability of any large cliques or independent sets appearing is at most the sum over all H of q_H. So, Ramsey graphs exist if we set the parameters so that this sum is less than 1. In fact the graph size can be set to 2^n while all clique/independent set sizes are held to poly(n).
Razborov goes one up on Erdos with a significantly more explicit Ramsey construction. He shows how the representation can be made exponentially concise.
'Hold on,' you might say, 'I can do that: "The lexicographically first Ramsey graph on 2^n vertices" is even a doubly-exponentially concise description.' Sure; but the obvious shortcoming of this description is it takes incredible time to unpack it if I were to ask you the simplest thing about your graph--say, Is node 2^n connected to node 2^(n-1)? Razborov's representation, by contrast, is a poly(n)-sized, constant-depth circuit that outputs the answers to precisely these edge-queries, on demand. The only thing your description has on his is that, though he shows his to exist, he doesn't exactly know how to find it...
Remarks
Why do I like this result so much? First, its proof uses graph expansion and the isolation lemma, two important tools I've enjoyed studying. It improves on the probabilistic method using.. the probabilistic method, something that took me by surprise.
The theorem also shows that certain mathematical objects are much more accessible than they might appear. To drive this home, let me suggest a small strengthening of the result: as should become apparent, we can get not just the guarantee that a small-circuit representation does exist, but also away to produce one with high probability. So, if you want a deterministic and simple instruction-kit, small enough to fit in your pocket, for building a Ramsey graph on 2^100 vertices edge by edge, rest assured that not only does there exist such a kit but that you can (probabilistically) build it yourself.
For those familiar with the polynomial hierarchy (PH) in complexity theory: this result is related to (though not implied by) the result of Sipser that BPP is contained in the second level of the PH, in the following way. Let L be a decision problem. Say we've designed a P-time algorithm for L that is almost deterministic, but relies on having oracle access to the incidence matrix of an exponentially large Ramsey graph (doesn't matter which one we're supplied). We could simulate the algorithm in BPP by randomly generating the graph as needed (a la Erdos); but, a la Razborov, we could also simulate the algorithm via a 2-round game: player 1 proposes a small circuit which he claims computes a Ramsey incidence matrix; player 2 is given the opportunity to refute the circuit by exhibiting a poly(n)-sized clique or independent set. If the refutation fails, the players run the algorithm with the circuit for an oracle. With slight modification, this shows that L is in both Sigma_2 and Pi_2, admittedly in a more complicated way then via Sipser's theorem.
Of course, the same result holds if L needs any other type of object amenable to Razborov's method, although I should point out that some important structures guaranteed to exist by the probabilistic method are not so amenable, e.g. expander graphs, whose defining contraints involve much larger vertex sets than the Ramsey constraints. (Of course, we now have good small descriptions of expanders by other, more direct means.)
Finally, this result prefigures one of Razborov's most famous results, one that showed him to be not just a technical virtuoso but a true complexity visionary: the Natural Proofs work with Steve Rudich. The theorem under discussion implies, in the natural-proofs framework, that properties like 'G is Ramsey' (which meets the 'largeness' requirement and, to an extent depending on the strength of the Ramsey property, the 'constructivity' one as well) cannot be used in a natural proof against polynomial-sized circuits (or even the constant-depth ones used in the Ramsey construction). This is what Razborov-Rudich would lead us to expect: if the property could be used in such a natural proof, it would show strong limits on the possibility of one-way functions, pseudorandom generators, and mathematically secure cryptography.
Conclusion: two thumbs up for Razborov's result.
A Proof Sketch
First, we should be clear about one thing: like Erdos' proof, Razborov's is not really about graphs; it only uses them as a test case. The proof method is properly viewed from the standpoint of boolean logic. So, the 'graph' is now an incidence matrix; the property 'G is Ramsey' becomes a conjunction of properties t_H for each large 'subgraph' H, stating that H isn't a clique or independent set; the goal is to produce an assignment to the variables that satisfies all the t_H. Erdos' proof solves this problem with the sole observation that each t_H is very likely to be satisfied by a random assignment. Razborov solves the problem in the stronger sense described after making one (and only one) additional observation about the clauses t_H: each one quantifies over only a few variables--poly(n) of them, the edges in the subgraph H.
Then, viewed in full generality, our task is to produce a circuit computing a certain kind of function f on m variables; by Razborov's observation, the requirements on the function f imposed by the construction task are a collection of properties t[1](f), ... t[k](f) which must be satisfied, each of which depends on only poly(m) bits of f (and is highly likely to be satisfied for random assignments to its f-bits).
Razborov sets up a randomized 'growth process' that builds up a circuit piece by piece. At each stage i we have a circuit C[i](x) computing a function F[i](x) on m variables x.
How does the growth process work? The bottom gate of the circuit is an XOR, and at every stage we just slap on a new argument to it. Each argument is an AND of an appropriate number of independently chosen random linear equations (XORs or negations of XORs) over the variables of x. So, this is a depth-3 circuit if you allow XOR in your basis and unbounded fanin.
The process doesn't converge quickly to a uniform distribution on functions (since most functions don't have small circuits), *but* when we restrict our attention to any fixed poly(m) bits of the output, we do get such convergence. Thus, we can terminate the growth process after poly(m) steps and the probability distribution on the terminal function f(x) = F_(poly(m))(x) is sufficiently 'locally uniform' that each property t[j](f) is satisfied with very high probability. Then we apply the union bound and conclude that with nonzero probability over all runs, f(x) satisfies all of them. So the only remaining task is to explain why this fast 'local' convergence to uniform occurs.
So fix any set K of poly(m) input vectors to the circuit (such as the ones an individual property t[j](f) depends on). Let's focus on what a particular AND-of-XORS does on inputs from K. The Isolation Lemma of Valiant & Vazirani (see Papadimitriou's textbook) kicks in to give us everything we need: if we choose the number of equations right (and we only need O(log(m)) of them), there is a decent chance (at least 1/8) that exactly one of the vectors in K will satisfy each equation, and in this case each vector in K is equally likely to be that one left standing.
Thus, for each vector v in K, each AND we construct has a 1/poly(m) chance of behaving, on input from K, like the function I_v(x) which is 1 iff x = v. These I_v, in turn, collectively form a *basis* for the vector space V of all boolean functions on K, with XOR of functions as the addition operation.
Think about this vector space V for functions on K as a hypercube of poly(m) dimensions, one for each I_v. At each stage of the growth process, we've got a circuit which induces a function on K, that is, a point in V. And in each phase, adding an argument to the bottom XOR, we make a jump to another point in this space, a jump which is independent of where we are in V so far. The question is, does this walk converge quickly (in poly(m) steps) to a nearly-uniform distribution?
Answer: Yes. It's a somewhat delicate exercise in probability, but the intuition is: when Isolation succeeds, the distribution is pushed closer to uniform to the same extent it would be in a simple random walk in V (since the resulting step is uniform over the 'edges' indexed by the I_v's, by the symmetry of Isolation). Isolation succeeds 1/8th of the time, so using the good expansion properties of the hypercube (see a previous post) and the expansion-convergence connection, we're golden, *as long as* the distribution is not made less uniform by the steps in which isolation fails. But even in these cases, we're XORing with a new function that is independent of what has come before, so the resulting effect cannot carry us further away from uniform (XOR is a 'smoothing' operation from this perspective).
This concludes the proof outline.
Labels: complexity
4 Comments:
This comment has been removed by a blog administrator.
By Anonymous, at 10:02 PM
I think this is a good method, actually sounds really interesting but I'd like to know more how that method works in order to getting a perfect randomness.
By Viagra Online, at 11:19 AM
I think there are better methods, but this one is not bad at all. So if you wanna use it you have all my support to do it.
By xlpharmacy, at 5:52 PM
Hey! Coach Factory Online simply wanted to say your website is one of the nicely laid out, most inspirational I have come across in quite a while.
By Coach Factory Online, at 2:50 AM
Post a Comment
<< Home