## Monday, September 25, 2006

## Thursday, September 21, 2006

### Invitation to Inanity

Tired of work, but too lazy to get up? Try this little game:

Take a thin pen with a cap. Uncap it and place pen and cap in the palm of your hand. Now, shut your eyes and recap the pen, without the aid of your other hand or any other object or surface, and without marking yourself.

With my pen (a Pilot Precise V5, cheap and common), hand size, and dexterity level, this has proved surprisingly fun and challenging.

Play at your own risk, obviously. As a lefty, my hand is usually ink-smudged just from writing, so it's no big deal, but watch out for the clothes.

Take a thin pen with a cap. Uncap it and place pen and cap in the palm of your hand. Now, shut your eyes and recap the pen, without the aid of your other hand or any other object or surface, and without marking yourself.

With my pen (a Pilot Precise V5, cheap and common), hand size, and dexterity level, this has proved surprisingly fun and challenging.

Play at your own risk, obviously. As a lefty, my hand is usually ink-smudged just from writing, so it's no big deal, but watch out for the clothes.

## Sunday, September 17, 2006

### Puzzle Time

The women of Delta Epsilon were a popular bunch, but a smart bunch too--smart enough to want more from the dating scene at their state U. First of all they were sick of that determined, smooth-talking type, the guy who'd hit on one sorority sister after another until one finally expressed an interest--unaware, perhaps, that she was number seven on his list.

Then, too, there were those shy, sensitive boys out there (frequently mathematicians); this kind of guy could be a real mensch, but it took him forever and a day to declare his affections.

The first type was learning too much about the feelings of women he didn't really care for; the second type was held back by his own fear of self-disclosure. It seemed clear that, in principle, both problems could be solved by a single mechanism: some way for a particular boy to inquire about one, and only one, sister's feelings about him (presumably his favorite; and we assume her feelings can be sumarized as like/dislike),

There was no third party on campus the sisters trusted to help carry out this function, and it seemed like a pipe dream. But one day, one of them read an intruiging ad for a company called Discreet Dachshunds, Inc. An inquiry revealed that they sold dogs trained in a special skill: their owner could whisper 'yes' or 'no' into each of the dog's ears, and then send it into another room to meet with a visitor. The visitor could reach out to shake the dog's left or right paw (visitor's choice); the dog would shake its head yes or no to repeat to the visitor what had been whispered into its corresponding ear by the owner. After one such transaction, the dog would scamper back to its owner, immediately forgetting what had transpired.

This dog would be perfect if there were only two sorority sisters, A and B: if a boy wanted to make a discreet inquiry, A would whisper her feelings about him into the dog's left ear, while B used the right ear, and the boy could learn exactly one of these pieces of information in his session with the dog.

However, Delta Epsilon had a sizeable and growing membership. The sisters asked about dogs capable of carrying more information at once, but Discreet Dachshunds replied that sadly, two bits was the upper limit with these animals--and anyways, dogs have only two paws for shaking.

Nevertheless, after careful deliberation, the sisters bought a dachshund from the company and used it cleverly to implement their system. They reaped the romantic benefits, and the dog was treated like royalty.

How did they do it?

i) Apologies for heteronormativity.

ii) I'm not sure how difficult readers will find this puzzle. A 3-woman sorority is the place to start experimenting.

iii) The dachshund in this puzzle corresponds to a functionality called '1-out-of-2 oblivious transfer (OT)' in the cryptographic literature; the result alluded to by the puzzle is '1-out-of-2 OT simulates 1-out-of-k OT'. In fact, OT can do much, much more than this, composing to form more complex cryptographic protocols in much the way {AND, OR, NOT} compose to form all boolean functions; and complexity theory offers hypotheses under which it's possible to do OT without the dog. See the Wiki page for history and references.

Extra credit 1: The sisters decide that instead of like/dislike, they need to express their feelings about a guy using b > 1 bits each. Once again, he can only learn about one sister's feelings, the others must be completely inviolate. Of course, the dog's abilities are unchanged.

Extra credit 2: Best physical realization of the OT mechanism? Quantum mechanics offers one, but that's cheating.

Then, too, there were those shy, sensitive boys out there (frequently mathematicians); this kind of guy could be a real mensch, but it took him forever and a day to declare his affections.

The first type was learning too much about the feelings of women he didn't really care for; the second type was held back by his own fear of self-disclosure. It seemed clear that, in principle, both problems could be solved by a single mechanism: some way for a particular boy to inquire about one, and only one, sister's feelings about him (presumably his favorite; and we assume her feelings can be sumarized as like/dislike),

*without*the sisters learning whom he asked about.There was no third party on campus the sisters trusted to help carry out this function, and it seemed like a pipe dream. But one day, one of them read an intruiging ad for a company called Discreet Dachshunds, Inc. An inquiry revealed that they sold dogs trained in a special skill: their owner could whisper 'yes' or 'no' into each of the dog's ears, and then send it into another room to meet with a visitor. The visitor could reach out to shake the dog's left or right paw (visitor's choice); the dog would shake its head yes or no to repeat to the visitor what had been whispered into its corresponding ear by the owner. After one such transaction, the dog would scamper back to its owner, immediately forgetting what had transpired.

This dog would be perfect if there were only two sorority sisters, A and B: if a boy wanted to make a discreet inquiry, A would whisper her feelings about him into the dog's left ear, while B used the right ear, and the boy could learn exactly one of these pieces of information in his session with the dog.

However, Delta Epsilon had a sizeable and growing membership. The sisters asked about dogs capable of carrying more information at once, but Discreet Dachshunds replied that sadly, two bits was the upper limit with these animals--and anyways, dogs have only two paws for shaking.

Nevertheless, after careful deliberation, the sisters bought a dachshund from the company and used it cleverly to implement their system. They reaped the romantic benefits, and the dog was treated like royalty.

How did they do it?

**Notes**i) Apologies for heteronormativity.

ii) I'm not sure how difficult readers will find this puzzle. A 3-woman sorority is the place to start experimenting.

iii) The dachshund in this puzzle corresponds to a functionality called '1-out-of-2 oblivious transfer (OT)' in the cryptographic literature; the result alluded to by the puzzle is '1-out-of-2 OT simulates 1-out-of-k OT'. In fact, OT can do much, much more than this, composing to form more complex cryptographic protocols in much the way {AND, OR, NOT} compose to form all boolean functions; and complexity theory offers hypotheses under which it's possible to do OT without the dog. See the Wiki page for history and references.

Extra credit 1: The sisters decide that instead of like/dislike, they need to express their feelings about a guy using b > 1 bits each. Once again, he can only learn about one sister's feelings, the others must be completely inviolate. Of course, the dog's abilities are unchanged.

Extra credit 2: Best physical realization of the OT mechanism? Quantum mechanics offers one, but that's cheating.

## Monday, September 04, 2006

### Constructive Chernoff Bounds

'Chernoff bounds' state that large sums of independent variables are very predictable in their behavior, with outcomes clustering tightly around the mean. In the simplest case, independent unbiased coin-flips, they just state that 'most' bitstrings of length n have roughly (n/2) 1's. These are the 'Chernoff bounds' I will be discussing.

These bounds are incredibly useful; in computer science, they spring up constantly in the analysis of randomized algorithms and processes. Is there any way for CS to repay this debt to classical mathematics, by shedding new light on Chernoff bounds?

Such bounds are already more or less perfected from the point of view of quantitative strength; what the CS perspective might emphasize is constructive proofs of the bounds. What's a constructive proof? Your interpretation is as good as mine, so before reading further, try proposing one and applying it to the question at hand.

Chernoff bounds are one among a whole genus of theorems comparing the sizes of sets. Take two sets A, B; I will take a constructive proof that

|A| >> |B| to mean an efficiently computable injective map from B x C into A, where C is a large set (and ideally a hypercube, so that the order of magnitude in the comparison of A and B is most apparent).

Now, I claim that linear error-correcting codes (ECCs), whose existence and quality are shown using Chernoff bounds, can be turned around and seen as constructive proofs of such bounds.

Here's why. First, recall that a linear ECC is a map from {0,1}^n to {0, 1}^(rn), r > 1, given by a matrix transformation A where all arithmetic is taken mod 2. The validity of the code consists in its being injective (A has rank n); the error-correcting property of the code consists in its having high minimum distance between any two codewords Ax and Ay. Easy linear algebra shows that this minimum distance is just the minimum Hamming weight of any nonzero codeword (note A0 = 0). Say this minimum weight is dn, with d>0.

Let V be the set of nonzero bitvectors of length n;

let L be the bitvectors of length rn and weight less than dn/2;

and let H be the bitvectors of length rn and weight at least dn/2. (So, L for 'light', H for 'heavy'.)

Define a map F from V x L into {0,1}^(rn) by F(x, z) = Ax + z, with bitvector addition mod 2. Clearly F is efficiently computable given A.

Claim: this map is injective. It is clearly injective for fixed x. To show F(x, z) != F(x', z') for x != x': Ax and Ax' are at a distance of at least dn, and z, z' have insufficient Hamming weight to bridge the gap.

Now further observe that, since x is taken from V and so is nonzero, F(x, z) has weight at least dn/2, hence maps into H. We conclude |H| >= |V|*|L| = (2^n - 1)*|L|. This means L occupies only an exponentially small (in n) fraction of the boolean rn-cube.

Combining this with a very similar construction for vectors of very high weight, we get what I would describe as a constructive Chernoff bound.

First, linear ECCs are not the only way to go here. One can also aim to show the comparative smallness of a vector set like L by finding a compression scheme for L. Then one can convert the resulting short strings into members of H by padding with an appropriate number of 0's and 1's.

Second, there is a connection between this use of ECCs and the result of Razborov I described in the post 'Using Randomness to Derandomize'. We can express the existence of an injective map from V x L into H as an exponentially long (in n) boolean formula in conjunctive normal form; the clauses disallow every possible collision, one by one. By (classical) Chernoff bounds, it's a satisfiable formula, highly likely to be satisfied by a random assignment (if d, r are appropriately chosen), and each clause depends only on O(n) variables and is exceedingly likely to be satisfied. Thus the hypotheses of Razborov's result are met, and the theorem predicts there is a poly(n)-sized constant-depth circuit with parity gates describing a satisfying assignment. Lo and behold--the linear ECC given by matrix A clearly describes the assignment, and its linearity allows it to be computed in constant depth with parity gates!

Third: you may object that the map F by itself isn't sufficient as a constructive proof, since there's no proof of its validity. However, the reasoning involved to establish F is pretty elementary, with the most involved step probably being the proof that A is an injective map. But certainly questions remain here, and again there is no single best notion of constructive proof to work with.

Two concluding remarks. One: linear ECCs are distinctive in their nature and applications (see also my 'Pseudorandomness' post), and should not just be considered a particular realization of the ECC concept.

Two: the relation between computer science and constructive mathematics is a rich area that extends far beyond this little taste (and beyond my own knowledge). More to come on this subject when I get the time.

These bounds are incredibly useful; in computer science, they spring up constantly in the analysis of randomized algorithms and processes. Is there any way for CS to repay this debt to classical mathematics, by shedding new light on Chernoff bounds?

Such bounds are already more or less perfected from the point of view of quantitative strength; what the CS perspective might emphasize is constructive proofs of the bounds. What's a constructive proof? Your interpretation is as good as mine, so before reading further, try proposing one and applying it to the question at hand.

Chernoff bounds are one among a whole genus of theorems comparing the sizes of sets. Take two sets A, B; I will take a constructive proof that

|A| >> |B| to mean an efficiently computable injective map from B x C into A, where C is a large set (and ideally a hypercube, so that the order of magnitude in the comparison of A and B is most apparent).

Now, I claim that linear error-correcting codes (ECCs), whose existence and quality are shown using Chernoff bounds, can be turned around and seen as constructive proofs of such bounds.

Here's why. First, recall that a linear ECC is a map from {0,1}^n to {0, 1}^(rn), r > 1, given by a matrix transformation A where all arithmetic is taken mod 2. The validity of the code consists in its being injective (A has rank n); the error-correcting property of the code consists in its having high minimum distance between any two codewords Ax and Ay. Easy linear algebra shows that this minimum distance is just the minimum Hamming weight of any nonzero codeword (note A0 = 0). Say this minimum weight is dn, with d>0.

Let V be the set of nonzero bitvectors of length n;

let L be the bitvectors of length rn and weight less than dn/2;

and let H be the bitvectors of length rn and weight at least dn/2. (So, L for 'light', H for 'heavy'.)

Define a map F from V x L into {0,1}^(rn) by F(x, z) = Ax + z, with bitvector addition mod 2. Clearly F is efficiently computable given A.

Claim: this map is injective. It is clearly injective for fixed x. To show F(x, z) != F(x', z') for x != x': Ax and Ax' are at a distance of at least dn, and z, z' have insufficient Hamming weight to bridge the gap.

Now further observe that, since x is taken from V and so is nonzero, F(x, z) has weight at least dn/2, hence maps into H. We conclude |H| >= |V|*|L| = (2^n - 1)*|L|. This means L occupies only an exponentially small (in n) fraction of the boolean rn-cube.

Combining this with a very similar construction for vectors of very high weight, we get what I would describe as a constructive Chernoff bound.

**Notes**First, linear ECCs are not the only way to go here. One can also aim to show the comparative smallness of a vector set like L by finding a compression scheme for L. Then one can convert the resulting short strings into members of H by padding with an appropriate number of 0's and 1's.

Second, there is a connection between this use of ECCs and the result of Razborov I described in the post 'Using Randomness to Derandomize'. We can express the existence of an injective map from V x L into H as an exponentially long (in n) boolean formula in conjunctive normal form; the clauses disallow every possible collision, one by one. By (classical) Chernoff bounds, it's a satisfiable formula, highly likely to be satisfied by a random assignment (if d, r are appropriately chosen), and each clause depends only on O(n) variables and is exceedingly likely to be satisfied. Thus the hypotheses of Razborov's result are met, and the theorem predicts there is a poly(n)-sized constant-depth circuit with parity gates describing a satisfying assignment. Lo and behold--the linear ECC given by matrix A clearly describes the assignment, and its linearity allows it to be computed in constant depth with parity gates!

Third: you may object that the map F by itself isn't sufficient as a constructive proof, since there's no proof of its validity. However, the reasoning involved to establish F is pretty elementary, with the most involved step probably being the proof that A is an injective map. But certainly questions remain here, and again there is no single best notion of constructive proof to work with.

Two concluding remarks. One: linear ECCs are distinctive in their nature and applications (see also my 'Pseudorandomness' post), and should not just be considered a particular realization of the ECC concept.

Two: the relation between computer science and constructive mathematics is a rich area that extends far beyond this little taste (and beyond my own knowledge). More to come on this subject when I get the time.

Labels: complexity

### Coding and the XOR Lemma

This is a note on Andrew Yao's XOR Lemma, a famous complexity result with several proofs that are well-exposited here. I won't prove the Lemma; instead I will discuss its slightly subtle relation to more recent coding-based hardness-amplification results, a relation explored in an article by Luca Trevisan. The coding and hardness-amplification background I'm assuming can be gleaned from Trevisan's excellent survey on the subject. This note just organizes a little of what I've learned.

Briefly, hardness amplification aims to take as hypothesis a function f that is 'slightly' hard to compute, and identify a function g that is 'very' hard to compute. Since we know complicated functions exist out there somewhere (Shannon's counting argument), we want g to be explicitly determined by f.

Coding-based approaches to this problem work as follows: take such an f, a finite function not computed exactly by any 'small' circuit. Let g's truth-table be an error-correcting encoding of the truth-table of f, under a (very) efficiently decodable code.

Suppose g were even approximately computable by a small circuit A. Then applying the decoder to the output of A would give us a small (but slightly larger) circuit that 'recovers' all of the bits of f (on demand) from its partially corrupted encoding in A. This contradicts the presumed hardness of f.

Now let's contrast the situation with Yao's XOR Lemma, one form of which states roughly that if no 'small'

circuit computes f correctly on more than a 1/2 + d

fraction of all inputs, then no (slightly smaller)

small circuit computes g(x, y)= f(x)+f(y) (mod 2) on more than a

1/2 + O(d^2) + e fraction of inputs, where e is a negligible function of n. Iterating this lemma multiple times to get a big XOR, the result is a very hard function g.

Why does this Lemma, and its variants, seem to require a stronger hardness assumption on f than the coding-based results (both in terms of the computational model--Yao requires hardness against nonuniform circuits--and in terms of the strength of hardness required)? Why are its proofs trickier? And finally, why is the Lemma still worth keeping around? The coding perspective sheds some light on these questions.

From this perspective, we are interested in the 'code' that takes the bitstring that is f's truth-table, and outputs g defined as above. Is this even a valid code? Nope--f and its negation both define the same g. Well, what if we restricted f's first bit to be a 1? Then encoding is unique; however, this isn't a very good error-correcting code; e.g., 1000000 and 1111111 have encodings g1, g2 that are quite close. This creates an inherent ambiguity that prevents us from obliviously applying a decoder algorithm to the g-approximating circuit A to get back to the exact original f, or even approximate the original f (since our example strings were far apart, no decoded string can approximate both of them at once).

But maybe we can exactly recover the original f from A if we are given a little bit of 'advice', e.g. to discriminate between decoding to 1000000 and 1111111. Since we are in the nonuniform model, this advice can be built into our circuits. But how much advice do we need? It's got to be small to contradict the hardness of f.

The problem is, every bit in f influences a rather small fraction of bits in the encoding g; thus, any string f' close to f has an encoding g' close to g. If f is a function on n variables, there are then a doubly-exponential-in-n number of functions whose encodings are close to that of f, hence potentially represented by the approximating circuit A (here making hidden assumptions about what 'close' means...). So exponential advice is needed to specify f from among this collection--unacceptably large. This means our decoder can't hope to exactly recover f, and explains why the proof methodology doesn't let us assume that f is merely hard to compute exactly.

The compromise reached by the Lemma is to only approximately recover f (using advice, which our example indicated was necessary), corresponding to the stronger hardness assumption on f in the Lemma's statement. Here we enter the realm of information-theoretic possibility, but the need for this recovery to be algorithmically efficient leads to involved proofs, which necessarily use advice-driven case analysis. Is this coding theory? Trevisan calls it 'approximate list-decoding'. I'd say, sure it's decoding, but the scope of coding theory is being stretched significantly (and very fruitfully, it seems) in the process.

So why is Yao's XOR Lemma still a viable tool? I'm not qualified to answer this in full generality. Moreover, the Lemma has been generalized and improved in several directions. But two observations: first, once we've got a function f hard enough to meet the hypotheses of the XOR Lemma, the Lemma can be repeatedly applied until the resulting g is way hard; some but not all error-correcting codes can do a comparable job of pushing the hardness that far. Second, the XOR-encoding used is very efficient, more so than the error-correcting codes based on, e.g., Reed-Muller; this means that if we are trying to identify very hard functions in a complexity class C, and we assume f is taken from C, then the encoding g is more likely to be in C as well, so that we get hardness amplification within the class. Since the main target of hardness amplification research is giving evidence of really hard problems inside NP, Yao's Lemma looks to remain vital. It can't deliver the goods fully yet (e.g., NP doesn't seem to be closed under XOR), but new insights keep coming; the new FOCS 2006 paper by Impagliazzo-Kabanets-Jaiswal looks to be in this vein.

In conclusion, the XOR Lemma is an important result with somewhat difficult (but not impossible) proofs; the coding perspective illuminates the shape of the Lemma's statement and explains why it cannot easily be strengthened.

Note, however, that the coding barriers we encountered are methodological ones, and don't in themselves give evidence about the actual limits on the hardness-amplification properties of the XOR operation. On that subject my knowledge is slim.

Briefly, hardness amplification aims to take as hypothesis a function f that is 'slightly' hard to compute, and identify a function g that is 'very' hard to compute. Since we know complicated functions exist out there somewhere (Shannon's counting argument), we want g to be explicitly determined by f.

Coding-based approaches to this problem work as follows: take such an f, a finite function not computed exactly by any 'small' circuit. Let g's truth-table be an error-correcting encoding of the truth-table of f, under a (very) efficiently decodable code.

Suppose g were even approximately computable by a small circuit A. Then applying the decoder to the output of A would give us a small (but slightly larger) circuit that 'recovers' all of the bits of f (on demand) from its partially corrupted encoding in A. This contradicts the presumed hardness of f.

Now let's contrast the situation with Yao's XOR Lemma, one form of which states roughly that if no 'small'

circuit computes f correctly on more than a 1/2 + d

fraction of all inputs, then no (slightly smaller)

small circuit computes g(x, y)= f(x)+f(y) (mod 2) on more than a

1/2 + O(d^2) + e fraction of inputs, where e is a negligible function of n. Iterating this lemma multiple times to get a big XOR, the result is a very hard function g.

Why does this Lemma, and its variants, seem to require a stronger hardness assumption on f than the coding-based results (both in terms of the computational model--Yao requires hardness against nonuniform circuits--and in terms of the strength of hardness required)? Why are its proofs trickier? And finally, why is the Lemma still worth keeping around? The coding perspective sheds some light on these questions.

From this perspective, we are interested in the 'code' that takes the bitstring that is f's truth-table, and outputs g defined as above. Is this even a valid code? Nope--f and its negation both define the same g. Well, what if we restricted f's first bit to be a 1? Then encoding is unique; however, this isn't a very good error-correcting code; e.g., 1000000 and 1111111 have encodings g1, g2 that are quite close. This creates an inherent ambiguity that prevents us from obliviously applying a decoder algorithm to the g-approximating circuit A to get back to the exact original f, or even approximate the original f (since our example strings were far apart, no decoded string can approximate both of them at once).

But maybe we can exactly recover the original f from A if we are given a little bit of 'advice', e.g. to discriminate between decoding to 1000000 and 1111111. Since we are in the nonuniform model, this advice can be built into our circuits. But how much advice do we need? It's got to be small to contradict the hardness of f.

The problem is, every bit in f influences a rather small fraction of bits in the encoding g; thus, any string f' close to f has an encoding g' close to g. If f is a function on n variables, there are then a doubly-exponential-in-n number of functions whose encodings are close to that of f, hence potentially represented by the approximating circuit A (here making hidden assumptions about what 'close' means...). So exponential advice is needed to specify f from among this collection--unacceptably large. This means our decoder can't hope to exactly recover f, and explains why the proof methodology doesn't let us assume that f is merely hard to compute exactly.

The compromise reached by the Lemma is to only approximately recover f (using advice, which our example indicated was necessary), corresponding to the stronger hardness assumption on f in the Lemma's statement. Here we enter the realm of information-theoretic possibility, but the need for this recovery to be algorithmically efficient leads to involved proofs, which necessarily use advice-driven case analysis. Is this coding theory? Trevisan calls it 'approximate list-decoding'. I'd say, sure it's decoding, but the scope of coding theory is being stretched significantly (and very fruitfully, it seems) in the process.

So why is Yao's XOR Lemma still a viable tool? I'm not qualified to answer this in full generality. Moreover, the Lemma has been generalized and improved in several directions. But two observations: first, once we've got a function f hard enough to meet the hypotheses of the XOR Lemma, the Lemma can be repeatedly applied until the resulting g is way hard; some but not all error-correcting codes can do a comparable job of pushing the hardness that far. Second, the XOR-encoding used is very efficient, more so than the error-correcting codes based on, e.g., Reed-Muller; this means that if we are trying to identify very hard functions in a complexity class C, and we assume f is taken from C, then the encoding g is more likely to be in C as well, so that we get hardness amplification within the class. Since the main target of hardness amplification research is giving evidence of really hard problems inside NP, Yao's Lemma looks to remain vital. It can't deliver the goods fully yet (e.g., NP doesn't seem to be closed under XOR), but new insights keep coming; the new FOCS 2006 paper by Impagliazzo-Kabanets-Jaiswal looks to be in this vein.

In conclusion, the XOR Lemma is an important result with somewhat difficult (but not impossible) proofs; the coding perspective illuminates the shape of the Lemma's statement and explains why it cannot easily be strengthened.

Note, however, that the coding barriers we encountered are methodological ones, and don't in themselves give evidence about the actual limits on the hardness-amplification properties of the XOR operation. On that subject my knowledge is slim.

Labels: complexity