Andy's Math/CS page

Wednesday, December 27, 2006

Fun with Censorship: the Finite Case

This post was inspired by Elad Verbin's insights and question about the 'Damn Good Puzzle'.  See that post's comments section for the buildup.

Here is a finite version of the censor puzzle (which readers should review for the scenario):

Given k > 0, there is an N (independent of the alphabet and allowed/forbidden word classification system), such that any text of length N has a parsing into k words, with all but possibly the first and last words sharing a classification.

Elad's comments suggested this theorem and implied the bound N <= R(k, k) + 1,
where R(k, k) is the kth Ramsey number.  But R(k, k) is super-exponential in k.  
In this post I will show how to get a quadratic upper bound.

Following Elad, given a text s_1, ... s_n of length n we define a graph coloring on the complete graph on vertices 0, 1, ... n.  Edge (i, j) with j > i is colored red if the word s_{i+1} ... s_j is allowed, otherwise, it's blue.  Now, if we can guarantee there exists a monochromatic path of length k in the graph, such that the path is ascending on vertex labels (call this an ascending path), it will naturally induce a parsing into at least k words (possibly k+1 or k+2, depending on whether vertices 0 and n get visited) that meets our condition.

Given an index i <= n and a 2-colored complete graph G on n+1 vertices, let R_i be the number of edges in the longest all-red ascending path in G that ends at vertex i.  
Similarly let B_i be the number of edges in the longest all-blue ascending path in G ending at vertex i.

Say i < j.  If (i, j) is colored red, then R_j >= R_i + 1, since any ascending red path ending at vertex i can be extended to vertex j via edge (i, j).

Similarly, if (i, j) is blue, then B_j >= B_i + 1.
In either case the (nonnegative) integer vector (R_j, B_j) does not equal (R_i, B_i).
There are only k^2 distinct nonnegative integer vectors having both terms at most k-1;
so if n >= k^2 + 1 some R_j or B_j must be at least k.  QED.

What I think is cool about this proof is that the measure of progress is very loose.  (R_j, B_j) doesn't strictly dominate (R_i, B_i), it just isn't equal, and the vectors just pile up until some vector--we can't predict which--necessarily exceeds our 'victory' bound.

The result may still be suboptimal for e.g. graph colorings induced by binary alphabets, since these are quite restricted.  Who knows, maybe there's a linear upper bound in this case.

If you liked this result, you should definitely check out the book Extremal Combinatorics by Stasys Jukna, which has all sorts of cool related stuff (and complexity theory applications, no less!  This is my second plug for the book, so take heed.).  So much stuff that I can't even rule out that it contains this result, perhaps as one of its many exercises--but my copy is in San Diego and I'm up in Berkeley on winter break.

It's been a year since I started this blog.  I intend to keep it rolling, and have several post ideas queued up, but I'd love to hear from you, dear reader, to get to know you and develop this site in conversation.

Addendum: I now realize that this result can also be derived from Dilworth's theorem (exercise, for those who care). While it doesn't seem to follow from Erdos-Szekeres (a special case of Dilworth), that theorem can be proved analogously to ours, so I probably read such a proof at one point. Nothing new under the sun...


Saturday, December 23, 2006

Trees and Infinity, part II

Previous: Part I

Next: Part III

Time for another short installment on trees and recursive combinatorics, that will address a question raised by the previous puzzle and take us on an journey beyond the arithmetic hierarchy (cue your favorite sci-fi theme music...).

To recap, Konig's Lemma guarantees that a locally-finite infinite tree has an infinite path.

Suppose some machine M(x) computes membership of x in a locally-finite tree T; if x is in T then M(x) outputs the parent and immediate children of x. Then there is no guarantee that T has a computably-describable infinite path; however, such a path can be computed, uniformly in M (assuming we know the root node), given oracle access to the Halting set.

Konig's Lemma fails if the nodes can be infinite-degree, even if the tree is infinite at every height (Exercise). But consider the following computational problem:

INF: Given a Turing machine M describing a tree T (not necessarily locally finite), does T have an infinite path?

This question has a superficial simplicity: it is purely existential. However, the object it seeks is infinite, and unlike in the case where Konig's Lemma applied, there is no obvious finite 'no' certificate. Here's the astounding truth of the matter:

Problem 1: There exists an algorithm which, given any formula F of first-order logic, outputs a polynomial-time Turing machine M describing a tree T such that T has an infinite path if and only if F is true.

Solving the problem boils down to this: suppose we've got a statement of the form Q: "there exists a number x such that P(x) is true", and we know how to convert each individual P(i) to a tree that has an infinite path if and only if P(i) is true; we now need to use this knowledge to build a tree corresponding to Q.

Only slightly harder is the other case that arises: if Q is of form "for all numbers x, P(x) is true", where we again know how to build trees for each individual P(i). Still, this is pretty cool, since you are effectively reducing a universal condition into an existential one. Once you can handle both cases, you can systematically attack an arbitrary first-order expression.

***Interlude: A Review***

If you're not up on first-order logic, just consider F a statement of form

"the game G(X1, X2, ... Xk) is a win for player 1 (assuming perfect play)",

where G is a computable function from N^k --> {0, 1} that defines a game as follows: player 1 and player 2 take turns in choosing values the k values X1, X2, ... Xk, and player 1 wins if and only if G evaluates to 1 under the input chosen.

This 'computable game' formulation turns out to be equivalent to first-order logic; i.e., given any F we can produce a G such that G is a win for player 1 if and only if F is true, and the number of play-rounds of G equals the depth of alternating quantifiers in F; there is also a reduction in the reverse direction.


So how hard is the computational problem of checking for infinite paths? If I've got my definitions right, it's complete for the first level of the analytic hierarchy: existential second-order logic.

Now here's a tie-in to the 'censor puzzle' I discussed recently. Given a 'classification' C of finite words over {0, 1} as allowed or forbidden and an infinite 'text' bitstring D, both poly-time computable, let PARSE be the computational problem of determining whether D can be parsed so as to appear wholly allowed with respect to C.

Problem 2: Show that PARSE and INF are Turing-equivalent.

The challenge in reducing INF to PARSE, which is not too hard to overcome, is that because our alphabet is finite, the same words pop up again and again, so our decisions in building up C keep on bugging us, whereas we want continual freedom of movement in encoding new and higher levels of the tree we are attempting to model in a PARSE instance.

In the reduction I came up with, it's always easy to find a wholly-forbidden parsing of the text string. Thus, as an open problem (update: solved! In the negative. See discussion in comments of Part I.) I ask,

Problem 3: Exhibit a PARSE instance, such that any parsing that is either wholly-allowed or wholly-forbidden (with the possible exception of the first word in each case) is not describable in the arithmetic hierarchy.


NP-completeness theory has transformed the way we regard finite optimization problems.
NP-complete problems require hard search, but on a basic level it's 'the same' hard search
in all cases.

Perhaps less widely known is that a similar phenomenon pervades higher
mathematics. Of course, computer scientists know that most natural
undecidable problems are so because they can be used
to solve the halting problem. If they are much harder (as with PARSE),
they are still likely to have a natural classification using the analytic
hierarchy or some other 'mega'-complexity class.

But there is more. If a highly nonconstructive proof seems to require
the axiom of choice, chances are the theorem being proved also
constructively implies some version of the axiom of choice and can be
classified as 'complete' for a certain logical class. This is a theme
explored by 'reverse mathematics', an area I'm only beginning to
learn about.

Revisiting familiar theorems with a view towards establishing their
complexity, informational, or logical content lends them new grandeur
and enriches our sense of the interconnectedness of the mathematical world.


Tuesday, December 05, 2006

A Damn Good Puzzle (Oops)

Feel like an exercise in the kind of infinite mathematics I described two posts ago? What follows is a slightly dolled-up version of a puzzle posed by the great mathematician Kolmogorov to an auditorium of Soviet middle-school students, and solved by a young Leonid Levin, future co-discoverer of NP-Completeness. This I learned from the entertaining and important book Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists by Dennis Shasha and Cathy Lazare. (Shasha is also a noted puzzle-poser.)

The Great Censor has, in his inscrutable wisdom, laid down a judgment on every finite string of letters in our alphabet--each is either allowed or forbidden.
We, the petty censors he commands, are to intercept various texts and classify them according to his precepts. However, when spaces are not used in texts, there is a potential ambiguity that makes life difficult.
For example, suppose ab is an allowed word and ba is forbidden (there are all sorts of other judgments as well).

Say the petty censors intercept the infinite string of symbols ababababab... .
One way to 'parse' it would be ab ab ab ..., and we'd judge it wholly allowable.
But another way would be as a ba ba ba ba ... and this would be wholly forbidden except for possibly the first word a.

The Great Censor doesn't mind a little arbitrariness in our choice of parsing--he is one arbitrary dude himself. However, he expects our judgments to be resounding; he doesn't like it if a parsed text alternates back and forth between allowed and forbidden words.

Achieving this has been a matter of great vexation among the censors. However, one day a clever young apprentice censor, still far from internalizing the entire system of judgments, had a realization:

No matter how the set of finite strings is divided between allowed and forbidden by the Great Censor, and no matter what infinite string of letters a petty censor is presented with, it will always be possible to parse the infinite string in such a way that the parsed words are either all allowed or all forbidden--with the possible exception of the first word, which may disagree with the others.

So why is this true?