Andy's Math/CS page

Tuesday, August 22, 2006

A Heads-Up for Puzzlers

This just in: I learned from the economics blog  Marginal Revolution that the mathematician Peter Winkler has recently compiled 7 puzzles.  Some were selected for their seeming impossibility, the rest for their seeming triviality.

I would thrust Winkler's Mathematical Puzzles: A Connoisseur's Collection into the hands of anyone interested in problem solving.  His selection of puzzles is superb and especially suited for anyone who (like me) tends to enjoy math most when it has a human element.  Contrast this with the relentless march of inscribed-pentagons and diophantine equations that seems to characterize most Olympiad-style problem books--it's a breath of fresh air.  Discrete and algorithmic mathematics also tend to figure prominently, another plus.

Of the new crop, I've only seen one before (number 3, the 'suicidal monks' problem, here in a more generalized form), but it's an amusing and worthwhile puzzle, which could serve as a good lead-in to theories of knowledge and meta-knowledge.

Oh, and it seems these 7 are a part of a sequel puzzle-book in the works.  Thanks, Peter!


Saturday, August 19, 2006

The Impossible

Since this page is now linked to by at least one other site, I should say a few words by way of introduction to a general audience. What motivates the topics I choose for study and exposition? Let me give a loose criterion, which accounts for a good deal but not all of what excites me.

I'm interested in impossible tasks. I want to know where and how and with what frequency they arise, and I want to know what makes them impossible. To a lesser extent I want to know what we can do about them; but I worry that an exclusive concern with the possible makes people hubristic. We need to acknowledge some essential bounds on the range of possible human activity (mental and material) if we are to explore life within those bounds effectively.

What bounds exist obviously depends on who we are, and I am not too interested in entering this debate. Instead, I want robust mathematical tools for taking models of agents and models of situations and determining what is impossible for such agents in such situations.

A good impossibility result should paint a general picture in which we can recognize ourselves and many of our experiences (the match doesn't have to be perfect; the result should be at least potentially 'about people', but not narrowly so). It should model agents as having a significant range of strategic resources, so that the result isn't a triviality and applies in many cases.  It should also be suggestive beyond the confines of its mathematical specifics.  

Let me give five example results, which I believe illustrate several rich strands of research into related but distinctive types of impossibility. I think each is simple enough to be treated as a puzzle by those with mathematical background, yet important and interesting enough to reward even casual thought or unsuccessful attempts at solution. All are well-known.

I. Information-Theoretic Impossibility

This type of impossibility result assumes that information is distributed far and wide and has to be aggregated to solve a task, but that the amount of communication is limited, or the communication lines are imperfect, or both. See my post 'Information Theory and CS' for more details.

i) Two parties, Alice and Bob, each hold a private n-bit string. They want to determine if their strings are identical. Prove that any (deterministic) protocol to solve this problem needs n bits of communication in the worst case.

ii) Two allied generals are hiding with their armies on opposite sides of a valley containing an enemy camp. The generals know that a surprise attack will only be successful if both armies have at least 100 soldiers and they both attack at exactly dawn on the same day. Neither knows the state of the other's army, so they must communicate, but to do this their messengers must cross the valley and risk being killed (this doesn't affect the enemy's readiness to defend, however).
Formalize this task in such a way that it is provably impossible (i.e. any protocol will yield failed attacks, and/or no attack when an attack is feasible), even if 'enough' messengers 'eventually' make it across in each direction.

The basic idea behind the proofs of information-theoretic impossibility results is usually to isolate two 'inequivalent' system states treated as 'equivalent' by any particular protocol due to the protocol's perceptual limitations. This mistake invalidates the protocol.

II. Game-Theoretic Impossibility

This kind of result assumes multiple agents with distinct and conflicting interests, and describes the way this conflict can preclude socially desirable outcomes.  

This type of research is contentious: in general there is a lack of consensus over both how to predict games' outcomes and how to define what is 'socially desirable'. But given any model of these parameters, the basic proof idea is often the following: "Say folks are 'cooperating', yielding a socially desirable state; then there exists an opportunity for some selfish agent to shift his behavior and reap benefits, disrupting the pattern of cooperation."  So instability is a key concept.

So on the one hand, there are results describing how a particular societal arrangement turns out poorly due to conflicts of interest. But there are also higher-level problems as to designing institutions that minimize the costs of conflict. It's the latter that interests me more and where interesting impossibility results (like Arrow's Theorem) seem to lie.

I hesitate to offer the Iterated Prisoner's Dilemma as a 'puzzle' (what's the solution?), and I am unsure whether Arrow's Theorem and the Chaos Theorem of voting theory should really be described as game-theoretic in nature. To be honest, my study of this type of results has been comparatively dilettantish, and I'd appreciate any suggestions as as to unifying themes and representative puzzles. But for the time being I'll suggest in sketch form the Median Voter Theorem (itself a form of the Prisoner's Dilemma):

iii) Two competing ice-cream vendors are deciding where to park their carts along the beach. Customers dislike walking on the hot sand and will go to the closest vendor. Why can't we expect a socially optimal outcome?

Game-theoretic research has had the least impact on computer science of the four categories I'm discussing, but its foothold in CS is growing.  Any comments on the significance/desirability of this development?  I don't yet have a strong opinion.

III. Intractibility-based Impossibility

This is (in my view) by far the most interesting and mysterious form of impossibility: we are given all the information we need to solve a mathematically well-defined problem, but we can't do it, because--why? Because we're not smart enough? What does 'smart' mean, and how smart does 'smart' get?  I was led into wondering about intractibility by my 
youthful experience with the board game Go.

This type of impossibility is the object of study in 'computational complexity'. This theory aims to determine the intrinsic requirements of any algorithms solving particular computational problems. For example, in the 'traveling salesman problem' we are given n cities and distances between them, and asked to produce the minimum-length tour of all cities. We would like an algorithm that reliably produces the answer in n^2 steps, but we strongly suspect such an algorithm does not exist--smart doesn't get that smart. Still, our ignorance in these matters is drastic. See the archives of Lance Fortnow's website for a good online tutorial.

We don't have good tools to diagnose high but finite problem complexity. On the other hand, we're now pretty good at diagnosing *infinite* problem complexity; the following result of Turing is foundational. (Note: I hesitate to pose this as a puzzle; although simple in retrospect, and essentially the first proof I was able to really appreciate in this life, it was one of Hilbert's famous problems, open for 30 years!)

iv) Prove that there is no computer program P (pick your favorite standard programming model, with the idealization of unbounded memory space and runtime) that, given a description of another program Q, reliably determines whether Q ever halts on the input '0'.

As a hint, don't try to understand the internal dynamics of the program P that we've supposed exists. Only consider its input-output behavior, and use it as a subroutine (in a larger program P' that P cannot possibly analyze correctly).

The 'black-box methodology' used in this proof is paradigmatic in much research on intractibility.  It underlines that, like it or not, our main tools for understanding intractable problems are the same constructive algorithmic techniques that we use to solve tractable problems.  The only difference is that we work with imaginary components (like P above) and the goal of our algorithms is to show their own inconsistency.

IV. Cryptographic Impossibility

Modern cryptography considers such complex scenarios that no one schema is likely to fit all of them. Nor can I summarize the sources of impossibility here. But the basic idea is often that agents want to distribute certain types of information without 'leaking' anything more to eavesdroppers, and without revealing more than they have to with each other.

Cryptography uses intractibility to hide sensitive information (witness public-key cryptosystems like RSA). But it is also valuable to see how much of cryptography is possible without the use of intractibility, and hopefully this approach reveals things about the 'rules of information' that are not elucidated by the more fully-cooperative settings of type-I results. The following puzzle is in this vein.

v) Alice and Bob have unlimited computational resources.  Each wants to know if they both have a sensitive medical condition so they can commiserate (each knows only his/her own status). They want a protocol that will allow them both to determine this, without leaking anything more. E.g. if only Bob has the condition, Alice shouldn't find this out (although Bob will necessarily learn Alice's negative status). Can this be done? What if the protocol is randomized, with small error probabilities and info leakages acceptable--how good can we get it?

(The label for this type of cryptographic task is 'secure multiparty computation'.)

Plenty of research remains in each of these four areas.  
Or, one can jump on the bandwagon of 'Internet-centric' computer science, 
and deal with situations like e-commerce in which all of these issues are
simultaneously present.  I wonder, in such cases, is it still coherent or worthwhile to 'decompose' the impossible in a situation into these different factors?  And are there important impossibility factors I'm leaving out, or unfairly conflating?  
Comments welcome.

Labels: , ,

Friday, August 18, 2006

Using Randomness to Derandomize

The 'probabilistic method' is everywhere in combinatorics. There are plenty of good expositions, with Alon and Spencer's among the juiciest. I aim to describe an exquisite application due to Razborov in 1988 that seems little-known (perhaps because the paper's in Russian). I read about it in Jukna's 'Extremal Combinatorics', which I strongly recommend.

The Result

This application revisits one of the very first sites of the probabilistic method: construction of Ramsey graphs. A graph is Ramsey if (speaking qualitatively, here as throughout) it is very large, yet contains no large cliques or independent sets. Erdos argued: pick a random graph by fixing the nodes and then including each edge independently with probability 1/2. Each 'large' subgraph H has only a 'very small' probability q_H of forming a clique or independent set, so by taking the union bound over all of these events for each small subgraph, the probability of any large cliques or independent sets appearing is at most the sum over all H of q_H. So, Ramsey graphs exist if we set the parameters so that this sum is less than 1. In fact the graph size can be set to 2^n while all clique/independent set sizes are held to poly(n).

Razborov goes one up on Erdos with a significantly more explicit Ramsey construction. He shows how the representation can be made exponentially concise.

'Hold on,' you might say, 'I can do that: "The lexicographically first Ramsey graph on 2^n vertices" is even a doubly-exponentially concise description.' Sure; but the obvious shortcoming of this description is it takes incredible time to unpack it if I were to ask you the simplest thing about your graph--say, Is node 2^n connected to node 2^(n-1)? Razborov's representation, by contrast, is a poly(n)-sized, constant-depth circuit that outputs the answers to precisely these edge-queries, on demand. The only thing your description has on his is that, though he shows his to exist, he doesn't exactly know how to find it...


Why do I like this result so much? First, its proof uses graph expansion and the isolation lemma, two important tools I've enjoyed studying. It improves on the probabilistic method using.. the probabilistic method, something that took me by surprise.

The theorem also shows that certain mathematical objects are much more accessible than they might appear. To drive this home, let me suggest a small strengthening of the result: as should become apparent, we can get not just the guarantee that a small-circuit representation does exist, but also away to produce one with high probability. So, if you want a deterministic and simple instruction-kit, small enough to fit in your pocket, for building a Ramsey graph on 2^100 vertices edge by edge, rest assured that not only does there exist such a kit but that you can (probabilistically) build it yourself.

For those familiar with the polynomial hierarchy (PH) in complexity theory: this result is related to (though not implied by) the result of Sipser that BPP is contained in the second level of the PH, in the following way. Let L be a decision problem. Say we've designed a P-time algorithm for L that is almost deterministic, but relies on having oracle access to the incidence matrix of an exponentially large Ramsey graph (doesn't matter which one we're supplied). We could simulate the algorithm in BPP by randomly generating the graph as needed (a la Erdos); but, a la Razborov, we could also simulate the algorithm via a 2-round game: player 1 proposes a small circuit which he claims computes a Ramsey incidence matrix; player 2 is given the opportunity to refute the circuit by exhibiting a poly(n)-sized clique or independent set. If the refutation fails, the players run the algorithm with the circuit for an oracle. With slight modification, this shows that L is in both Sigma_2 and Pi_2, admittedly in a more complicated way then via Sipser's theorem.
Of course, the same result holds if L needs any other type of object amenable to Razborov's method, although I should point out that some important structures guaranteed to exist by the probabilistic method are not so amenable, e.g. expander graphs, whose defining contraints involve much larger vertex sets than the Ramsey constraints. (Of course, we now have good small descriptions of expanders by other, more direct means.)

Finally, this result prefigures one of Razborov's most famous results, one that showed him to be not just a technical virtuoso but a true complexity visionary: the Natural Proofs work with Steve Rudich. The theorem under discussion implies, in the natural-proofs framework, that properties like 'G is Ramsey' (which meets the 'largeness' requirement and, to an extent depending on the strength of the Ramsey property, the 'constructivity' one as well) cannot be used in a natural proof against polynomial-sized circuits (or even the constant-depth ones used in the Ramsey construction). This is what Razborov-Rudich would lead us to expect: if the property could be used in such a natural proof, it would show strong limits on the possibility of one-way functions, pseudorandom generators, and mathematically secure cryptography.

Conclusion: two thumbs up for Razborov's result.

A Proof Sketch

First, we should be clear about one thing: like Erdos' proof, Razborov's is not really about graphs; it only uses them as a test case. The proof method is properly viewed from the standpoint of boolean logic. So, the 'graph' is now an incidence matrix; the property 'G is Ramsey' becomes a conjunction of properties t_H for each large 'subgraph' H, stating that H isn't a clique or independent set; the goal is to produce an assignment to the variables that satisfies all the t_H. Erdos' proof solves this problem with the sole observation that each t_H is very likely to be satisfied by a random assignment. Razborov solves the problem in the stronger sense described after making one (and only one) additional observation about the clauses t_H: each one quantifies over only a few variables--poly(n) of them, the edges in the subgraph H.

Then, viewed in full generality, our task is to produce a circuit computing a certain kind of function f on m variables; by Razborov's observation, the requirements on the function f imposed by the construction task are a collection of properties t[1](f), ... t[k](f) which must be satisfied, each of which depends on only poly(m) bits of f (and is highly likely to be satisfied for random assignments to its f-bits).

Razborov sets up a randomized 'growth process' that builds up a circuit piece by piece. At each stage i we have a circuit C[i](x) computing a function F[i](x) on m variables x.

How does the growth process work? The bottom gate of the circuit is an XOR, and at every stage we just slap on a new argument to it. Each argument is an AND of an appropriate number of independently chosen random linear equations (XORs or negations of XORs) over the variables of x. So, this is a depth-3 circuit if you allow XOR in your basis and unbounded fanin.

The process doesn't converge quickly to a uniform distribution on functions (since most functions don't have small circuits), *but* when we restrict our attention to any fixed poly(m) bits of the output, we do get such convergence. Thus, we can terminate the growth process after poly(m) steps and the probability distribution on the terminal function f(x) = F_(poly(m))(x) is sufficiently 'locally uniform' that each property t[j](f) is satisfied with very high probability. Then we apply the union bound and conclude that with nonzero probability over all runs, f(x) satisfies all of them. So the only remaining task is to explain why this fast 'local' convergence to uniform occurs.

So fix any set K of poly(m) input vectors to the circuit (such as the ones an individual property t[j](f) depends on). Let's focus on what a particular AND-of-XORS does on inputs from K. The Isolation Lemma of Valiant & Vazirani (see Papadimitriou's textbook) kicks in to give us everything we need: if we choose the number of equations right (and we only need O(log(m)) of them), there is a decent chance (at least 1/8) that exactly one of the vectors in K will satisfy each equation, and in this case each vector in K is equally likely to be that one left standing.

Thus, for each vector v in K, each AND we construct has a 1/poly(m) chance of behaving, on input from K, like the function I_v(x) which is 1 iff x = v. These I_v, in turn, collectively form a *basis* for the vector space V of all boolean functions on K, with XOR of functions as the addition operation.

Think about this vector space V for functions on K as a hypercube of poly(m) dimensions, one for each I_v. At each stage of the growth process, we've got a circuit which induces a function on K, that is, a point in V. And in each phase, adding an argument to the bottom XOR, we make a jump to another point in this space, a jump which is independent of where we are in V so far. The question is, does this walk converge quickly (in poly(m) steps) to a nearly-uniform distribution?

Answer: Yes. It's a somewhat delicate exercise in probability, but the intuition is: when Isolation succeeds, the distribution is pushed closer to uniform to the same extent it would be in a simple random walk in V (since the resulting step is uniform over the 'edges' indexed by the I_v's, by the symmetry of Isolation). Isolation succeeds 1/8th of the time, so using the good expansion properties of the hypercube (see a previous post) and the expansion-convergence connection, we're golden, *as long as* the distribution is not made less uniform by the steps in which isolation fails. But even in these cases, we're XORing with a new function that is independent of what has come before, so the resulting effect cannot carry us further away from uniform (XOR is a 'smoothing' operation from this perspective).

This concludes the proof outline.