Andy's Math/CS page

Friday, June 23, 2006

Graph Expansion and Concentration of Measure

I'm a 'low-class' mathematician. I'm happy to learn about phenomena in their most scaled-down form, the discrete, finite case. When I heard the 'counting-coins' interpretation of Lebesgue integration, I was satisfied and closed the book (until I was assigned the rest).

The downside is, I'm unable to properly appreciate a lot of analysis. The upsides are, i) being CS-oriented, the finite cases are largely adequate; ii) sometimes I can prove them myself.

Let me describe what I think I've figured out about the phenomenon called 'Concentration of Measure' after some casual reading about it in the continuous case, here worked out in my low-class terms. Say we've got a real-valued function F with mean 0 on the Boolean n-cube, and it's 'Lipschitz' in the sense that it changes by at most +-1 across any edge. Then the 'phenomenon' is that the function is very tightly concentrated around its mean. In fact, for a fixed c > 0, the fraction of points x where F(x)> cn is bounded by an inverse-exponential function of n (one can do better regarding the deviation size, but in addition to being low-class, I'm a little lazy).

Note that Concentration of Measure here generalizes the basic Chernoff bound for unbiased voting, since F(x) = ||x|| - n/2 is Lipschitz (where ||•|| denotes Hamming weight).

Before reading on, think about possible connections to graph expansion. This was the factor that got me excited.

Suppose the statement is false; then for some c > 0, there must be a sequence of functions F_n, defined on Boolean cubes of increasing dimension n, such that the 'atypical sets' S_n = { x: F(x) > cn } are not exponentially insignificant compared to 2^n. The key observation is that, if we take

r_n = max r: S_n has at least the volume (# of points) of an r-ball in r_n,

then lim sup (r_n/n) is at least 1/2. (This because, for all s < 1/2, sn-balls are exponentially dinky in n when compared to the n-cube.)

Now we're going to use the expansion properties of the n-cube: the intuitive 'Isoperimetric Inequality', which says that a set at least as big as a k-ball taken with its vertex neighborhood are collectively at least as big as a (k+1)-ball. In other words, balls minimize (vertex-) surface area. There is a nice proof by iteratively 'squashing' a set into ball shape, which can be found here.

What's the use of this? Consider S'_n = { x: F(x) > cn/2 }. Any points within cn/2 distance of S_n are contained in S'_n (by the Lipschitz property); so, by iterative application of the Iso Inequality, S'_n must be as big as an (r_n + cn/2)-ball. Since we know r_n/n gets very close to 1/2 i.o., this tells us that S'_n is bigger than an (1/2 + c/4)*n-ball i.o.; and such balls fill all but an inverse-exponential-in-n fraction of the whole cube. Now there is no way these F_n can have mean 0. We're done.

What's used about the n-cubes in this proof, as a graph family? Nothing more nor less that they have modestly good expansion properties. We can modify this proof to show even stronger Concentration of Measure for 'typical' graphs, which are better expanders even if they are regular with constant degree. I see a weak converse, too.

Food for thought: it's not too hard to come up with a notion of expansion for e.g. smooth manifolds; but what pray tell might we mean by a 'typical' manifold, and how is its expansion?


Monday, June 12, 2006

Pseudorandomness, Coding, and Complexity

What's shaking: I've graduated with a BA in Math, and I'm having a great summer in Amherst doing... largely more math. Let me describe a snippet. First, a broad, partial, and subjective overview of the body of research I have been (for many months now, but with many distractions) familiarizing myself with:

1. In the late 80s and 90s, a line of work based largely on a paper by Nisan and Wigderson showed how to derandomize algorithms based on a hardness assumption (see, e.g., Sanjeev Arora's textbook on his website, for this and other complexity theory to recent to be covered by my longtime favorite reference, Papadimitriou's). This involved a 'worst-case to average case reduction', showing that if one problem is hard for small circuits in the worst case, then another one is hard for small(er) circuits to even approximately compute.

2. Prominent researchers like Sudan, Trevisan, and Vadhan (who gave one proof of the result alluded to) have promoted the view that such reductions are essentially corollaries of efficient encoding and decoding algorithms for error-correcting coding and some variants on this concept.

My take: On the one hand, they are clearly correct that these theorems can be derived as corrolaries of said algorithms. It seems equally clear that this perspective is a very fertile one to take. My only reservation (which I'm sure these researchers share) is that we should still look for other ways of viewing these results, which might suggest new theorems in turn. Where should we look?

3. Error-correcting codes, I am becoming aware, are one creature among many existing in a loosely defined bestiary of combinatorial objects possessing 'pseudorandom' aspects. (see Trevisan's survey 'Pseudorandomness and Combinatorial Constructions'.) They live among hash families, expanders and extractors, low-discrepancy sets, pairwise-almost-independent functions, and on and on. What unites these objects? One or both of the following properties:

i) When we look for (X), we wish we had randomness. It is typically easy to randomly construct these objects, with good if not optimal parameters, with high probability. On the other hand, efficient deterministic constructions are, to varying degrees, 'tricky', or at least require types of math that weren't as much a part of the CS toolbox before the demand for these types of objects came around... regarding which demand,

ii) When we wish we had randomness, we might settle for (X). Randomized algorithms often only use a small number of the likely properties of random sequences, and smaller 'pseudorandom' distributions might possess these properties too. For example (see the textbooks 'Communication Complexity' or 'Randomized Algorithms' for more on this one), say we want to see if two databases on n bits are equivalent with minimal communication between them. Suppose we communicate random bits to select completely at random a Boolean function from n bits to k. With probability (1 - 1/(2^k)), two distinct databases will have distinct values under this random function and so will be 'told apart' by these signatures.
However, it'd take 2^(kn) bits to describe the function, way more than just communicating the whole databases. What we want is a much smaller ensemble of 'pseudorandom' functions that behave like truly random ones at least for this purpose. The now-standard solution (at least in theoretical CS) is a concept called universal hashing.

What's great about these classes of objects is that, though some explicit constructions are really involved, there are interesting 'black-box reductions' between many of the families. There's a simple transformation, e.g., taking any sufficiently good expander graph and outputting a good linear error-correcting code. So there are natural 'affinities' between many properties of random structures that allow one to trade between them. I thoroughly recommend, as one studies these objects, to keep a running 'reductions diagram'. These reductions don't usually give the best parameters, but the mere fact that they exist seems important.

Now let me describe something I came up with (which, however, is probably not new). I think it both illustrates, in a distinct setting from the one described earlier, the 'Coding Thesis' that coding algorithms can undergird a lot of complexity theory, *and* the 'Substitutability Thesis', namely that pseudorandom objects can stand in for each other in many cases. It would be nice to have more examples where something other than coding theory stands in for coding, but mine brings coding in to replace hashing.

I will assume that readers are familiar with the result of Valiant & Vazirani that NP is contained in RP^(Parity-P). (If not, see Papadimitriou or Arora.) What always weirded me out is that the 2 proofs I'd read--based on random weight functions and random hyperplanes--both guaranteed the isolation of *unique* solutions with high probability, whereas one might think we could save randomness by settling for the guarantee of just *oddly many* solutions with high probability.

Well, I found an alternative proof (still incomplete) that indeed doesn't guarantee or use isolation. It's all about 'oddification'. The downside is, it doesn't seem to be more randomness-efficient, and, obviously, doesn't prove NP is in RP^USAT. It does, however, use the power of Parity-P in a more essential way than previous proofs.

The proof idea: Let input x of length n to an NP machine M be fixed. Consider the characteristic vector v of length m = 2^(poly(n)) that describes all accepting computations of M on x (assume there are some). Let A be an Rm x m 0-1 matrix for a good linear code (mapping m-bit vectors to Rm-bit vectors). R can be a constant, by an easy existence proof; I am assuming that there is a family of such A for which R is constant and individual entries of A can be computed in time O(log m). I will have to check this, but it seems likely to be true now or in the near future (perhaps using expander codes).

For linear codes, good minimum distance between codewords is equivalent to high number of 1's in any nonzero codeword ENC(v) = Av (all arithmetic mod 2). So Av has many ones; I think we can get a constant fraction in our explicit family.

The algorithm: pick independently at random a few rows a1, ... am of A (each indexed in the range 1 ... Rm, so requiring only poly(n) random bits.) Then, each dot product ai*v can be determined with a Parity-P query (for ai*v = parity of the number of accepting computations whose A-matrix entry, in row i, is 1). If we pick enough and v isn't the zero vector, then with high probability at least one query will come out a 1. If v is zero, of course, all come out zero.

More updates soon to come, hopefully; I'm also learning about Fourier analysis of Boolean functions, with applications to the class AC_0.