### Pseudorandomness, Coding, and Complexity

What's shaking: I've graduated with a BA in Math, and I'm having a great summer in Amherst doing... largely more math. Let me describe a snippet. First, a broad, partial, and subjective overview of the body of research I have been (for many months now, but with many distractions) familiarizing myself with:

1. In the late 80s and 90s, a line of work based largely on a paper by Nisan and Wigderson showed how to derandomize algorithms based on a hardness assumption (see, e.g., Sanjeev Arora's textbook on his website, for this and other complexity theory to recent to be covered by my longtime favorite reference, Papadimitriou's). This involved a 'worst-case to average case reduction', showing that if one problem is hard for small circuits in the worst case, then another one is hard for small(er) circuits to even approximately compute.

2. Prominent researchers like Sudan, Trevisan, and Vadhan (who gave one proof of the result alluded to) have promoted the view that such reductions are essentially corollaries of efficient encoding and decoding algorithms for error-correcting coding and some variants on this concept.

My take: On the one hand, they are clearly correct that these theorems can be derived as corrolaries of said algorithms. It seems equally clear that this perspective is a very fertile one to take. My only reservation (which I'm sure these researchers share) is that we should still look for other ways of viewing these results, which might suggest new theorems in turn. Where should we look?

3. Error-correcting codes, I am becoming aware, are one creature among many existing in a loosely defined bestiary of combinatorial objects possessing 'pseudorandom' aspects. (see Trevisan's survey 'Pseudorandomness and Combinatorial Constructions'.) They live among hash families, expanders and extractors, low-discrepancy sets, pairwise-almost-independent functions, and on and on. What unites these objects? One or both of the following properties:

i) When we look for (X), we wish we had randomness. It is typically easy to randomly construct these objects, with good if not optimal parameters, with high probability. On the other hand, efficient deterministic constructions are, to varying degrees, 'tricky', or at least require types of math that weren't as much a part of the CS toolbox before the demand for these types of objects came around... regarding which demand,

ii) When we wish we had randomness, we might settle for (X). Randomized algorithms often only use a small number of the likely properties of random sequences, and smaller 'pseudorandom' distributions might possess these properties too. For example (see the textbooks 'Communication Complexity' or 'Randomized Algorithms' for more on this one), say we want to see if two databases on n bits are equivalent with minimal communication between them. Suppose we communicate random bits to select completely at random a Boolean function from n bits to k. With probability (1 - 1/(2^k)), two distinct databases will have distinct values under this random function and so will be 'told apart' by these signatures.

However, it'd take 2^(kn) bits to describe the function, way more than just communicating the whole databases. What we want is a much smaller ensemble of 'pseudorandom' functions that behave like truly random ones at least for this purpose. The now-standard solution (at least in theoretical CS) is a concept called universal hashing.

What's great about these classes of objects is that, though some explicit constructions are really involved, there are interesting 'black-box reductions' between many of the families. There's a simple transformation, e.g., taking any sufficiently good expander graph and outputting a good linear error-correcting code. So there are natural 'affinities' between many properties of random structures that allow one to trade between them. I thoroughly recommend, as one studies these objects, to keep a running 'reductions diagram'. These reductions don't usually give the best parameters, but the mere fact that they exist seems important.

Now let me describe something I came up with (which, however, is probably not new). I think it both illustrates, in a distinct setting from the one described earlier, the 'Coding Thesis' that coding algorithms can undergird a lot of complexity theory, *and* the 'Substitutability Thesis', namely that pseudorandom objects can stand in for each other in many cases. It would be nice to have more examples where something other than coding theory stands in for coding, but mine brings coding in to replace hashing.

I will assume that readers are familiar with the result of Valiant & Vazirani that NP is contained in RP^(Parity-P). (If not, see Papadimitriou or Arora.) What always weirded me out is that the 2 proofs I'd read--based on random weight functions and random hyperplanes--both guaranteed the isolation of *unique* solutions with high probability, whereas one might think we could save randomness by settling for the guarantee of just *oddly many* solutions with high probability.

Well, I found an alternative proof (still incomplete) that indeed doesn't guarantee or use isolation. It's all about 'oddification'. The downside is, it doesn't seem to be more randomness-efficient, and, obviously, doesn't prove NP is in RP^USAT. It does, however, use the power of Parity-P in a more essential way than previous proofs.

The proof idea: Let input x of length n to an NP machine M be fixed. Consider the characteristic vector v of length m = 2^(poly(n)) that describes all accepting computations of M on x (assume there are some). Let A be an Rm x m 0-1 matrix for a good linear code (mapping m-bit vectors to Rm-bit vectors). R can be a constant, by an easy existence proof; I am assuming that there is a family of such A for which R is constant and individual entries of A can be computed in time O(log m). I will have to check this, but it seems likely to be true now or in the near future (perhaps using expander codes).

For linear codes, good minimum distance between codewords is equivalent to high number of 1's in any nonzero codeword ENC(v) = Av (all arithmetic mod 2). So Av has many ones; I think we can get a constant fraction in our explicit family.

The algorithm: pick independently at random a few rows a1, ... am of A (each indexed in the range 1 ... Rm, so requiring only poly(n) random bits.) Then, each dot product ai*v can be determined with a Parity-P query (for ai*v = parity of the number of accepting computations whose A-matrix entry, in row i, is 1). If we pick enough and v isn't the zero vector, then with high probability at least one query will come out a 1. If v is zero, of course, all come out zero.

More updates soon to come, hopefully; I'm also learning about Fourier analysis of Boolean functions, with applications to the class AC_0.

1. In the late 80s and 90s, a line of work based largely on a paper by Nisan and Wigderson showed how to derandomize algorithms based on a hardness assumption (see, e.g., Sanjeev Arora's textbook on his website, for this and other complexity theory to recent to be covered by my longtime favorite reference, Papadimitriou's). This involved a 'worst-case to average case reduction', showing that if one problem is hard for small circuits in the worst case, then another one is hard for small(er) circuits to even approximately compute.

2. Prominent researchers like Sudan, Trevisan, and Vadhan (who gave one proof of the result alluded to) have promoted the view that such reductions are essentially corollaries of efficient encoding and decoding algorithms for error-correcting coding and some variants on this concept.

My take: On the one hand, they are clearly correct that these theorems can be derived as corrolaries of said algorithms. It seems equally clear that this perspective is a very fertile one to take. My only reservation (which I'm sure these researchers share) is that we should still look for other ways of viewing these results, which might suggest new theorems in turn. Where should we look?

3. Error-correcting codes, I am becoming aware, are one creature among many existing in a loosely defined bestiary of combinatorial objects possessing 'pseudorandom' aspects. (see Trevisan's survey 'Pseudorandomness and Combinatorial Constructions'.) They live among hash families, expanders and extractors, low-discrepancy sets, pairwise-almost-independent functions, and on and on. What unites these objects? One or both of the following properties:

i) When we look for (X), we wish we had randomness. It is typically easy to randomly construct these objects, with good if not optimal parameters, with high probability. On the other hand, efficient deterministic constructions are, to varying degrees, 'tricky', or at least require types of math that weren't as much a part of the CS toolbox before the demand for these types of objects came around... regarding which demand,

ii) When we wish we had randomness, we might settle for (X). Randomized algorithms often only use a small number of the likely properties of random sequences, and smaller 'pseudorandom' distributions might possess these properties too. For example (see the textbooks 'Communication Complexity' or 'Randomized Algorithms' for more on this one), say we want to see if two databases on n bits are equivalent with minimal communication between them. Suppose we communicate random bits to select completely at random a Boolean function from n bits to k. With probability (1 - 1/(2^k)), two distinct databases will have distinct values under this random function and so will be 'told apart' by these signatures.

However, it'd take 2^(kn) bits to describe the function, way more than just communicating the whole databases. What we want is a much smaller ensemble of 'pseudorandom' functions that behave like truly random ones at least for this purpose. The now-standard solution (at least in theoretical CS) is a concept called universal hashing.

What's great about these classes of objects is that, though some explicit constructions are really involved, there are interesting 'black-box reductions' between many of the families. There's a simple transformation, e.g., taking any sufficiently good expander graph and outputting a good linear error-correcting code. So there are natural 'affinities' between many properties of random structures that allow one to trade between them. I thoroughly recommend, as one studies these objects, to keep a running 'reductions diagram'. These reductions don't usually give the best parameters, but the mere fact that they exist seems important.

Now let me describe something I came up with (which, however, is probably not new). I think it both illustrates, in a distinct setting from the one described earlier, the 'Coding Thesis' that coding algorithms can undergird a lot of complexity theory, *and* the 'Substitutability Thesis', namely that pseudorandom objects can stand in for each other in many cases. It would be nice to have more examples where something other than coding theory stands in for coding, but mine brings coding in to replace hashing.

I will assume that readers are familiar with the result of Valiant & Vazirani that NP is contained in RP^(Parity-P). (If not, see Papadimitriou or Arora.) What always weirded me out is that the 2 proofs I'd read--based on random weight functions and random hyperplanes--both guaranteed the isolation of *unique* solutions with high probability, whereas one might think we could save randomness by settling for the guarantee of just *oddly many* solutions with high probability.

Well, I found an alternative proof (still incomplete) that indeed doesn't guarantee or use isolation. It's all about 'oddification'. The downside is, it doesn't seem to be more randomness-efficient, and, obviously, doesn't prove NP is in RP^USAT. It does, however, use the power of Parity-P in a more essential way than previous proofs.

The proof idea: Let input x of length n to an NP machine M be fixed. Consider the characteristic vector v of length m = 2^(poly(n)) that describes all accepting computations of M on x (assume there are some). Let A be an Rm x m 0-1 matrix for a good linear code (mapping m-bit vectors to Rm-bit vectors). R can be a constant, by an easy existence proof; I am assuming that there is a family of such A for which R is constant and individual entries of A can be computed in time O(log m). I will have to check this, but it seems likely to be true now or in the near future (perhaps using expander codes).

For linear codes, good minimum distance between codewords is equivalent to high number of 1's in any nonzero codeword ENC(v) = Av (all arithmetic mod 2). So Av has many ones; I think we can get a constant fraction in our explicit family.

The algorithm: pick independently at random a few rows a1, ... am of A (each indexed in the range 1 ... Rm, so requiring only poly(n) random bits.) Then, each dot product ai*v can be determined with a Parity-P query (for ai*v = parity of the number of accepting computations whose A-matrix entry, in row i, is 1). If we pick enough and v isn't the zero vector, then with high probability at least one query will come out a 1. If v is zero, of course, all come out zero.

More updates soon to come, hopefully; I'm also learning about Fourier analysis of Boolean functions, with applications to the class AC_0.

Labels: complexity

## 0 Comments:

Post a Comment

<< Home