Coding and the XOR Lemma
Briefly, hardness amplification aims to take as hypothesis a function f that is 'slightly' hard to compute, and identify a function g that is 'very' hard to compute. Since we know complicated functions exist out there somewhere (Shannon's counting argument), we want g to be explicitly determined by f.
Coding-based approaches to this problem work as follows: take such an f, a finite function not computed exactly by any 'small' circuit. Let g's truth-table be an error-correcting encoding of the truth-table of f, under a (very) efficiently decodable code.
Suppose g were even approximately computable by a small circuit A. Then applying the decoder to the output of A would give us a small (but slightly larger) circuit that 'recovers' all of the bits of f (on demand) from its partially corrupted encoding in A. This contradicts the presumed hardness of f.
Now let's contrast the situation with Yao's XOR Lemma, one form of which states roughly that if no 'small'
circuit computes f correctly on more than a 1/2 + d
fraction of all inputs, then no (slightly smaller)
small circuit computes g(x, y)= f(x)+f(y) (mod 2) on more than a
1/2 + O(d^2) + e fraction of inputs, where e is a negligible function of n. Iterating this lemma multiple times to get a big XOR, the result is a very hard function g.
Why does this Lemma, and its variants, seem to require a stronger hardness assumption on f than the coding-based results (both in terms of the computational model--Yao requires hardness against nonuniform circuits--and in terms of the strength of hardness required)? Why are its proofs trickier? And finally, why is the Lemma still worth keeping around? The coding perspective sheds some light on these questions.
From this perspective, we are interested in the 'code' that takes the bitstring that is f's truth-table, and outputs g defined as above. Is this even a valid code? Nope--f and its negation both define the same g. Well, what if we restricted f's first bit to be a 1? Then encoding is unique; however, this isn't a very good error-correcting code; e.g., 1000000 and 1111111 have encodings g1, g2 that are quite close. This creates an inherent ambiguity that prevents us from obliviously applying a decoder algorithm to the g-approximating circuit A to get back to the exact original f, or even approximate the original f (since our example strings were far apart, no decoded string can approximate both of them at once).
But maybe we can exactly recover the original f from A if we are given a little bit of 'advice', e.g. to discriminate between decoding to 1000000 and 1111111. Since we are in the nonuniform model, this advice can be built into our circuits. But how much advice do we need? It's got to be small to contradict the hardness of f.
The problem is, every bit in f influences a rather small fraction of bits in the encoding g; thus, any string f' close to f has an encoding g' close to g. If f is a function on n variables, there are then a doubly-exponential-in-n number of functions whose encodings are close to that of f, hence potentially represented by the approximating circuit A (here making hidden assumptions about what 'close' means...). So exponential advice is needed to specify f from among this collection--unacceptably large. This means our decoder can't hope to exactly recover f, and explains why the proof methodology doesn't let us assume that f is merely hard to compute exactly.
The compromise reached by the Lemma is to only approximately recover f (using advice, which our example indicated was necessary), corresponding to the stronger hardness assumption on f in the Lemma's statement. Here we enter the realm of information-theoretic possibility, but the need for this recovery to be algorithmically efficient leads to involved proofs, which necessarily use advice-driven case analysis. Is this coding theory? Trevisan calls it 'approximate list-decoding'. I'd say, sure it's decoding, but the scope of coding theory is being stretched significantly (and very fruitfully, it seems) in the process.
So why is Yao's XOR Lemma still a viable tool? I'm not qualified to answer this in full generality. Moreover, the Lemma has been generalized and improved in several directions. But two observations: first, once we've got a function f hard enough to meet the hypotheses of the XOR Lemma, the Lemma can be repeatedly applied until the resulting g is way hard; some but not all error-correcting codes can do a comparable job of pushing the hardness that far. Second, the XOR-encoding used is very efficient, more so than the error-correcting codes based on, e.g., Reed-Muller; this means that if we are trying to identify very hard functions in a complexity class C, and we assume f is taken from C, then the encoding g is more likely to be in C as well, so that we get hardness amplification within the class. Since the main target of hardness amplification research is giving evidence of really hard problems inside NP, Yao's Lemma looks to remain vital. It can't deliver the goods fully yet (e.g., NP doesn't seem to be closed under XOR), but new insights keep coming; the new FOCS 2006 paper by Impagliazzo-Kabanets-Jaiswal looks to be in this vein.
In conclusion, the XOR Lemma is an important result with somewhat difficult (but not impossible) proofs; the coding perspective illuminates the shape of the Lemma's statement and explains why it cannot easily be strengthened.
Note, however, that the coding barriers we encountered are methodological ones, and don't in themselves give evidence about the actual limits on the hardness-amplification properties of the XOR operation. On that subject my knowledge is slim.