Constructive Chernoff Bounds
'Chernoff bounds' state that large sums of independent variables are very predictable in their behavior, with outcomes clustering tightly around the mean. In the simplest case, independent unbiased coin-flips, they just state that 'most' bitstrings of length n have roughly (n/2) 1's. These are the 'Chernoff bounds' I will be discussing.
These bounds are incredibly useful; in computer science, they spring up constantly in the analysis of randomized algorithms and processes. Is there any way for CS to repay this debt to classical mathematics, by shedding new light on Chernoff bounds?
Such bounds are already more or less perfected from the point of view of quantitative strength; what the CS perspective might emphasize is constructive proofs of the bounds. What's a constructive proof? Your interpretation is as good as mine, so before reading further, try proposing one and applying it to the question at hand.
Chernoff bounds are one among a whole genus of theorems comparing the sizes of sets. Take two sets A, B; I will take a constructive proof that
|A| >> |B| to mean an efficiently computable injective map from B x C into A, where C is a large set (and ideally a hypercube, so that the order of magnitude in the comparison of A and B is most apparent).
Now, I claim that linear error-correcting codes (ECCs), whose existence and quality are shown using Chernoff bounds, can be turned around and seen as constructive proofs of such bounds.
Here's why. First, recall that a linear ECC is a map from {0,1}^n to {0, 1}^(rn), r > 1, given by a matrix transformation A where all arithmetic is taken mod 2. The validity of the code consists in its being injective (A has rank n); the error-correcting property of the code consists in its having high minimum distance between any two codewords Ax and Ay. Easy linear algebra shows that this minimum distance is just the minimum Hamming weight of any nonzero codeword (note A0 = 0). Say this minimum weight is dn, with d>0.
Let V be the set of nonzero bitvectors of length n;
let L be the bitvectors of length rn and weight less than dn/2;
and let H be the bitvectors of length rn and weight at least dn/2. (So, L for 'light', H for 'heavy'.)
Define a map F from V x L into {0,1}^(rn) by F(x, z) = Ax + z, with bitvector addition mod 2. Clearly F is efficiently computable given A.
Claim: this map is injective. It is clearly injective for fixed x. To show F(x, z) != F(x', z') for x != x': Ax and Ax' are at a distance of at least dn, and z, z' have insufficient Hamming weight to bridge the gap.
Now further observe that, since x is taken from V and so is nonzero, F(x, z) has weight at least dn/2, hence maps into H. We conclude |H| >= |V|*|L| = (2^n - 1)*|L|. This means L occupies only an exponentially small (in n) fraction of the boolean rn-cube.
Combining this with a very similar construction for vectors of very high weight, we get what I would describe as a constructive Chernoff bound.
Notes
First, linear ECCs are not the only way to go here. One can also aim to show the comparative smallness of a vector set like L by finding a compression scheme for L. Then one can convert the resulting short strings into members of H by padding with an appropriate number of 0's and 1's.
Second, there is a connection between this use of ECCs and the result of Razborov I described in the post 'Using Randomness to Derandomize'. We can express the existence of an injective map from V x L into H as an exponentially long (in n) boolean formula in conjunctive normal form; the clauses disallow every possible collision, one by one. By (classical) Chernoff bounds, it's a satisfiable formula, highly likely to be satisfied by a random assignment (if d, r are appropriately chosen), and each clause depends only on O(n) variables and is exceedingly likely to be satisfied. Thus the hypotheses of Razborov's result are met, and the theorem predicts there is a poly(n)-sized constant-depth circuit with parity gates describing a satisfying assignment. Lo and behold--the linear ECC given by matrix A clearly describes the assignment, and its linearity allows it to be computed in constant depth with parity gates!
Third: you may object that the map F by itself isn't sufficient as a constructive proof, since there's no proof of its validity. However, the reasoning involved to establish F is pretty elementary, with the most involved step probably being the proof that A is an injective map. But certainly questions remain here, and again there is no single best notion of constructive proof to work with.
Two concluding remarks. One: linear ECCs are distinctive in their nature and applications (see also my 'Pseudorandomness' post), and should not just be considered a particular realization of the ECC concept.
Two: the relation between computer science and constructive mathematics is a rich area that extends far beyond this little taste (and beyond my own knowledge). More to come on this subject when I get the time.
These bounds are incredibly useful; in computer science, they spring up constantly in the analysis of randomized algorithms and processes. Is there any way for CS to repay this debt to classical mathematics, by shedding new light on Chernoff bounds?
Such bounds are already more or less perfected from the point of view of quantitative strength; what the CS perspective might emphasize is constructive proofs of the bounds. What's a constructive proof? Your interpretation is as good as mine, so before reading further, try proposing one and applying it to the question at hand.
Chernoff bounds are one among a whole genus of theorems comparing the sizes of sets. Take two sets A, B; I will take a constructive proof that
|A| >> |B| to mean an efficiently computable injective map from B x C into A, where C is a large set (and ideally a hypercube, so that the order of magnitude in the comparison of A and B is most apparent).
Now, I claim that linear error-correcting codes (ECCs), whose existence and quality are shown using Chernoff bounds, can be turned around and seen as constructive proofs of such bounds.
Here's why. First, recall that a linear ECC is a map from {0,1}^n to {0, 1}^(rn), r > 1, given by a matrix transformation A where all arithmetic is taken mod 2. The validity of the code consists in its being injective (A has rank n); the error-correcting property of the code consists in its having high minimum distance between any two codewords Ax and Ay. Easy linear algebra shows that this minimum distance is just the minimum Hamming weight of any nonzero codeword (note A0 = 0). Say this minimum weight is dn, with d>0.
Let V be the set of nonzero bitvectors of length n;
let L be the bitvectors of length rn and weight less than dn/2;
and let H be the bitvectors of length rn and weight at least dn/2. (So, L for 'light', H for 'heavy'.)
Define a map F from V x L into {0,1}^(rn) by F(x, z) = Ax + z, with bitvector addition mod 2. Clearly F is efficiently computable given A.
Claim: this map is injective. It is clearly injective for fixed x. To show F(x, z) != F(x', z') for x != x': Ax and Ax' are at a distance of at least dn, and z, z' have insufficient Hamming weight to bridge the gap.
Now further observe that, since x is taken from V and so is nonzero, F(x, z) has weight at least dn/2, hence maps into H. We conclude |H| >= |V|*|L| = (2^n - 1)*|L|. This means L occupies only an exponentially small (in n) fraction of the boolean rn-cube.
Combining this with a very similar construction for vectors of very high weight, we get what I would describe as a constructive Chernoff bound.
Notes
First, linear ECCs are not the only way to go here. One can also aim to show the comparative smallness of a vector set like L by finding a compression scheme for L. Then one can convert the resulting short strings into members of H by padding with an appropriate number of 0's and 1's.
Second, there is a connection between this use of ECCs and the result of Razborov I described in the post 'Using Randomness to Derandomize'. We can express the existence of an injective map from V x L into H as an exponentially long (in n) boolean formula in conjunctive normal form; the clauses disallow every possible collision, one by one. By (classical) Chernoff bounds, it's a satisfiable formula, highly likely to be satisfied by a random assignment (if d, r are appropriately chosen), and each clause depends only on O(n) variables and is exceedingly likely to be satisfied. Thus the hypotheses of Razborov's result are met, and the theorem predicts there is a poly(n)-sized constant-depth circuit with parity gates describing a satisfying assignment. Lo and behold--the linear ECC given by matrix A clearly describes the assignment, and its linearity allows it to be computed in constant depth with parity gates!
Third: you may object that the map F by itself isn't sufficient as a constructive proof, since there's no proof of its validity. However, the reasoning involved to establish F is pretty elementary, with the most involved step probably being the proof that A is an injective map. But certainly questions remain here, and again there is no single best notion of constructive proof to work with.
Two concluding remarks. One: linear ECCs are distinctive in their nature and applications (see also my 'Pseudorandomness' post), and should not just be considered a particular realization of the ECC concept.
Two: the relation between computer science and constructive mathematics is a rich area that extends far beyond this little taste (and beyond my own knowledge). More to come on this subject when I get the time.
Labels: complexity
1 Comments:
Hey! Coach Factory Online simply wanted to say your website is one of the nicely laid out, most inspirational I have come across in quite a while.
By Coach Factory Online, at 2:50 AM
Post a Comment
<< Home