# Andy's Math/CS page

## Tuesday, December 05, 2006

### A Damn Good Puzzle (Oops)

Feel like an exercise in the kind of infinite mathematics I described two posts ago? What follows is a slightly dolled-up version of a puzzle posed by the great mathematician Kolmogorov to an auditorium of Soviet middle-school students, and solved by a young Leonid Levin, future co-discoverer of NP-Completeness. This I learned from the entertaining and important book Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists by Dennis Shasha and Cathy Lazare. (Shasha is also a noted puzzle-poser.)

The Great Censor has, in his inscrutable wisdom, laid down a judgment on every finite string of letters in our alphabet--each is either allowed or forbidden.
We, the petty censors he commands, are to intercept various texts and classify them according to his precepts. However, when spaces are not used in texts, there is a potential ambiguity that makes life difficult.
For example, suppose ab is an allowed word and ba is forbidden (there are all sorts of other judgments as well).

Say the petty censors intercept the infinite string of symbols ababababab... .
One way to 'parse' it would be ab ab ab ..., and we'd judge it wholly allowable.
But another way would be as a ba ba ba ba ... and this would be wholly forbidden except for possibly the first word a.

The Great Censor doesn't mind a little arbitrariness in our choice of parsing--he is one arbitrary dude himself. However, he expects our judgments to be resounding; he doesn't like it if a parsed text alternates back and forth between allowed and forbidden words.

Achieving this has been a matter of great vexation among the censors. However, one day a clever young apprentice censor, still far from internalizing the entire system of judgments, had a realization:

No matter how the set of finite strings is divided between allowed and forbidden by the Great Censor, and no matter what infinite string of letters a petty censor is presented with, it will always be possible to parse the infinite string in such a way that the parsed words are either all allowed or all forbidden--with the possible exception of the first word, which may disagree with the others.

So why is this true?

Labels:

• umm..this is not too hard given your earlier post :)

By  Anonymous, at 10:57 PM

• Agreed. But is this result really 'just' a repackaged version of something I discussed earlier, say of Konig's Lemma? The difficulty I see is that, depending on how it is construed, the decision tree for censors is either not locally finite, or else infinite paths in the tree don't necessarily correspond to valid parsings (since there are no infinite words). But I'd like to know your thoughts.

Recursive combinatorics, as described earlier, gives us an interesting framework for thinking about these matters. It asks, is there an algorithmic reduction from finding good censor-parsings to finding infinite paths?

More formally, are there algorithms M, N, such that, if D is a 'dictionary' oracle classifying words as allowed/forbidden, and A is a 'text' oracle giving an infinite string,

the algorithm M (with oracle access to D & A ) computes the structure of an infinite, locally finite tree T
(say M(x) decides whether a string x is on the tree, and if so outputs its parent and children),

and this T is such that, if N is given oracle access to a description S of any infinite path in T, N computes a parsing of A that meets the censor's requirements?

By  Andy D, at 9:24 PM

• Hi Andy. I'm usually lurking, but I'm a regular reader, and I'm really enjoying your blog. I want to suggest a solution to the censor problem which I think is different than the one you thought of.

My solution uses the infinite Ramsey theorem. The intuition behind using Ramsey's Theorem is that Ramsey's theorem states that in a large enough graph, there is a not-too-small clique or independent set. Here, in the question, you are looking for just such a thing -- a parsing (which is kind of like a subsystem, at least in the sense that you have lots of options to choose from), where you want the parsing to be either all-allowed or all-censored. This is like wanting to find either a clique or an independent set.

Here is the infinite Ramsey Theorem , stolen from wikipedia with some changes, for reference sake: Let X be some countably infinite set and colour the elements of X(2) (the subsets of X of size 2) in two different colours. Then there exists some infinite subset M of X such that the size 2 subsets of M all have the same colour.

Here is the proof. Let us suppose we have some partitioning of all finite strings to allowed and disallowed, and we get an infinite string s which we want to parse. We build an infinite graph G and apply the infinite Ramsey Theorem. G's vertices are the natural numbers. G is the complete graph, so it has an edge between every two vertices. The edge going between i and j, where j>i, is colored blue if the substring s[i+1..j] is allowed, and the edge is colored red if this substring is not allowed.

Now, we use Ramsey's theorem on G, and we get an infinite set of vertices M that induces a monochromatic subgraph. Suppose that this subgraph is all-blue. (the case for all-red is similar). Suppose that M={i_1,i_2,i_3,...} where i_1 < i_2 < i_3 < ... . Then we define the partitioning of s to be s[1..i_1], s[i_1+1..i_2],s[i_2+1..i_3], and so on. From the fact that M induces an all-blue subgraph, we get that all of the words except possibly the first one are allowed. QED.

Now, here's an observation and a question: Ramsey's Theorem has a finite analogue, which states that if you want to get a clique/independent-set of size k, then you'll find such a thing in every graph of size at least R(k), where R is some function that grown super-exponentially with k. Defining a finite analogue of the censor problem is not straightforward. You'd want to say that the Great Censor only allowed or disallowed all strings of length up to k, and you are given a string of length n and want to parse it to subwords of length at most k, such that either all of the subwords are allowed, or all are disallowed (except the first subword). Is there always n large enough so that for every word of length at least n you can do this?

I would expect the answer to be yes, but it is no: suppose that the alphabet is {a,b}, and the censor allows all words that contain at least one a', and disallows all words that consist only of b's. Now, for every n which is at least 2k+1 you can define the word b^{n-1}a. It is easy to see that this word cannot be partitioned in a legal way.

Maybe this isn't a big surprise, because the censor problem isn't really like Ramsey's Theorem: we're looking for a partition rather than a sub-structure. There are other difference as well. However, I still find it strange that you can prove the infinite censor problem using Ramsey's Theorem, but you can't prove the finite version using the finite Ramsey theorem (because the finite censor problem is false).

And I promised a question: Can you find a finite analogue of the censor problem that _is_ true?

By  Elad Verbin, at 5:50 PM

• Wow... thanks Elad, you just made my day. Thanks for your insightful comments. (I guess my blog is now an 'international sensation'...?)

It is gratifying to hear from anyone who enjoys this blog, and encourages me to keep at it, so I encourage any feedback, regardless of your background (and I do hope to cater to a variety of backgrounds & interests).

In this case, Elad, you've improved my understanding of the parsing problem. In particular, your solution answers the open problem I posed at the end of the following post, since (I just looked it up on google, look for 'Recursive Ramsey Theory') recursive infinite graph 2-colorings have Pi_2-computable infinite monochromatic subgraphs.

I think this is really interesting in combination with the hardness result I mentioned on finding all-allowed-word parsings (assuming my result is correct). It implies you can program super-arithmetic complexity into all allowed-parsings, or all forbidden-parsings, but not both at once--the Ramsey effect interferes.

***

I knew about the basic infinite Ramsey theorem, but for some reason I never tried to apply it here (I was thinking 'tree', not 'graph'..). For the record, here is my solution, informally:

Call position i an 'allowed-lock' if s[i+1]...s[j] is an allowed string for all j > i. Similarly for 'forbidden-locks'.

If there are infinitely many allowed-locks, we can parse at these points to create an all-but-one-word-allowed solution. Infinitely many forbidden-locks gives an all-but-one-word-forbidden solution. If there's only finitely many of either lock type, we jump over all of them with our first parsed word, and then we can proceed obliviously and make the remaining words either all-allowed or all-forbidden, as we prefer.
(Actually, my solution can resolve Problem 3 above too now that I think about it. Still, yours is better.)

***

As far as your question, I think Ramsey theory does give (by direct translation after your model) an interesting finite analogue for the censor problem:

For every k, there exists an n, such that every text of length N > n

(over any alphabet, and relative to any allowed/forbidden classification)

has a parsing into at least k+1 words, all of which share a classification (except possibly the first and last).

Clearly Ramsey theory is a useful perspective on the problem. But I think 'generic' Ramsey theory will be suboptimal. In the above finite version, it tells us n = R(k, k) will do the job; but if the alphabet is binary, say, the underlying graph is very restricted in its possibilities, so we should be able to find a smaller bound.

By  Andy D, at 2:14 AM

• Heck, you don't even need to find a monochromatic clique, just a monochromatic path, though it's got to respect the text-derived linear ordering.

Hmm.. Dilworth's Theorem? Erdos-Szekeres Theorem? Those pop to mind as potentially useful.

By  Andy D, at 2:26 AM

• Thinking back, I'm pretty sure Jukna's 'Extremal Combinatorics' contains a result (Kleitman's I think) that is close to or identical to the type of theorem I'm looking for. If it's the one, we can replace n = R(k, k) with a much nicer n = O(k^2). But I left my copy at school when I came home for the break. The proof was short but damnably clever, I recall...

By  Andy D, at 3:22 AM

• OK, after a night's fitful sleep I remember the theorem and the proof. It's not quite what's needed in the parsing problem but I am trying to apply it.

The theorem: suppose we give distinct numerical labels to each edge of a complete graph on k vertices; then there is an 'ascending walk' in the graph (i.e. edge labels get bigger as you go.. we get to choose the starting point) of length at least (k-1)/2.

Proof: add the edges into the empty graph one by one, in ascending label order.

Let l_i be the length of the longest label-ascending path ending at node i in the edges added so far;

then when we add in an edge (i, j), either l_i or l_j or both must increase. (Think about it.. remember (i, j) is the highest-labeled edge so far.)

By the pigeonhole principle, after adding in all k*(k-1)/2 edges, some l_i must reach value at least
(1/k)* k*(k-1)/2 = (k-1)/2. QED

PS. I used to worry that I didn't dream about math/CS. But I found that if I think about the stuff obsessively right before bedtime, it basically forces a low-level math delirium throughout the night. However, I'm not sure if it helps me solve math problems or just screws up my sleep, so I don't do it much.

By  Andy D, at 2:14 PM

• Update--see latest post!

By  Andy D, at 3:59 PM

• My solution is as following, I am not sure it works. I will focus on binary strings (0/1), it is easy to generalize it.

Let's assume the statement is not true. Define a probability distribution on all finite string colorings such that
P(w_bad) > 0 and P(w_good) > 0 where w_bad is a coloring (which should exist) for a string s_bad (which should exist as well) such that we cannot do all censored or all uncensored. w_good is some trivial coloring (say, give all finite strings 0).
Now, look at s_bad and create an index set of the form (i,j), starting with (1,1) (1,2) (2,2), (1,3) (2,3) (3,3) where those indices define substrings in s_bad (or any string).
Now, define X_(i,j)(w) = 0 if coloring is bad and 1 if coloring is good.
We know that the event A = { exists lim X_(i,j) } is in the tail sigma-field, so by Kolomogorov law, its probability should be either 0 or 1. On the other hand, P(w_bad) > 0 and P(w_good) > 0, and for the first the lim does not exist, and for the second the lim does exist - this means we cannot have P(A) being 1 or 0. Hence, a contradiction, which comes from the assumption that the pair w_bad and s_bad exist.

By  Anonymous, at 7:27 AM

• Hi Anon,
I don't quite follow your argument. First, what is s_bad? Is it a single string, or does it range over a collection? Also, for any finite string, you can always make it one block (ie one word), and therefore all-censored or all-uncensored. The difficulty is in passage to infinite strings.

Your idea of using the 0/1 law is interesting, though, and I would enjoy seeing a full solution of this sort... I'll think about it.
(I'll be away from the computer for the next ~24 hrs.)

By  Andy, at 8:58 AM

• Sorry, let me make it more clear (my argument also had a flaw because of independence assumptions that did not exist.)

I want to use proof by contradiction. Let's assume that it is not true that there is such spacing. Then, there exists a w_bad which is a function from finite strings to {censored,uncensored} and infinite string s_bad such that no spacing of s_bad will lead to a sequence of censored or uncensored starting from a certain location.
Now, build the following probability space:
Let Omega = {w_bad,w_triv} x {0,1}
where w_triv gives "censored" (0) to everything, and each element in Omega has probability 1/4.
Define the index set I: I = {(1,1),(1,2),(2,2),(1,3),(2,3),(3,3),...}
(in this order) and define
X_{(i,j)}(w,b) to be the result of w (which can be w_bad or w_triv) on the substring denoted by (i,j) in s_bad XORed with the bit b (which is 0 or 1).
Now, clearly X_k are independent (where k ranges over I in the order above), because of the bit b that keeps xoring the result randomlly. In addition, for w=w_bad and b=0 we know that there limit of X_k does not exist (if it exists, then at some point X_k is fixed such that all substrings are 0 or 1 from a certain location, which means we have the property we want and that we assumed that did not exist.)
What about w=w_triv and b=0? We know that for this case limit of X_k does exist, because X_k(w_triv,0)=0 all the way.
This means that the event A = { (w,b) | lim X_k exists } has probability neither 0 nor 1. However, the event A belongs to the "tail sigma-field" of X_k which are independent (hopefully you know this definition), and therefore by the 0/1 Kolmogorov law, it has probability 0 or 1 hence the contradiction.
Again, hope I got it right.
Can you refer me to the original solution of the middle school pupil?

Shay

By  Anonymous, at 1:55 PM

• Hi Shay,
So you're just flipping one random bit b here? I don't think this random masking will generate full independence of your variables, even if they're pairwise independent. For example, if you start looking at the X_k and see that they're all the same value, this will allow you to infer that probably w_triv was chosen. And I believe I can cook up easy examples where the 0/1 law fails for pairwise-independent rvs.

I still think a proof such as this may be possible, but I also think it's somewhat overkill in that the 0/1 law may be of a higher proof-theoretic strength than is required for this application.
(As I discuss in the comments and follow-up post, I think the 'right' proof is the one that shows you the degree of difficulty of the problem of producing a 'homogeneous parsing' for a given sequence.)

I think the original solution is found in the book 'Out of their Minds' I mentioned in the post. See also the other solutions in the comments, as well as a few later posts in which I do some follow-up discussions. (and if you enjoyed this one, you might also enjoy the puzzles I describe later based on recursion theory.)

Or, you could email the 'pupil' himself, Prof. Leonid Levin of BU.

By  Andy, at 8:32 PM

• Andy,
You are correct. With one bit, the X_ks will not be independent. Whenever X_k will point to the same substring in s_bad, then knowing one immediately tells us what is the other, since b is sampled only once.

I will try to do it a little bit differently... This time b will be an infinite vector of bits, (e.g. 01110011...).
The new probability space now gives 1/8 to (w_bad,000000..), 1/8 to (w_bad,1111...), 1/8 to (w_triv,0000...), 1/8 to (w_triv,11111..) and the rest of the probability mass (1/2) is divided uniformly among pairs (w,b) (note that each b can be thought of as a real number in binary representation, no problem defining uniform distribution over that.)
Now, the X_ks are independent (right?). We follow the same logic like before, showing that the probability is neither 0 nor 1 for { lim X_k exists }, although 0/1 Kolmogorov law tells us it should be. The contradiction comes from the same assumption like before.
Oh well... A little bit late here, hope it is correct. If you find another flaw, it would be great (and not disappointing)...

I will try to find the original solution. Thanks for the blog, I find it quite interesting.

By  Anonymous, at 11:48 PM

• Still not indepenent. Never mind.
Will try to think of that later.

By  Anonymous, at 1:59 AM

• Andy,
I think I solved it eventually...
the way I solved it is a little bit different from what I originally thought would be possible... what I did - ordered all infinite strings lexicographically and proved by induction on that order. first I showed that every string that starts with 1 eventually has a substring at location i to infinity which is "smaller" than the whole string (otherwise it is all 1s). then from ind. hyp. you know that substring can be "parsed" like you wanted (and you just ignore the first part until that point.)
for general string, you can scan it until you find a 1 and then apply the argument starting from that point. for base case, all 0s (smallest string), easy to see it is possible.

anyway, thanks for the question. i read that you thought it is possible to solve it using Konig's lemma, but I am not sure how... i thought of having a vertex v_i,j and an edge between v_i,j to v_j+1,k if both of the same type... but unfortunately, the degree of these vertices is not bound, so i dont see how i can apply konig's lemma.

By  Anonymous, at 7:54 PM

• Interesting... but you have to be careful when doing induction on infinite sets. It only works on certain ordered sets, so-called "well-orders" (see wiki). The lexicographic ordering you seem to be using (if I'm reading correctly) is not a well-ordering: it contains an 'infinite descending sequence' 10000..., 01000..., 001000, ...

To see the problem with doing induction on general ordered sets, consider the claim "all real numbers in the range [-1, 1] are less than or equal to zero."

It's true for the 'base case' -1, and if it's true for all numbers below some value x, it must be true for x as well (right?); therefore it's true for all of
[-1, 1]!

By  Andy, at 1:53 PM

• You are right, I noticed it after I posted the comment, but I thought I would wait when I have maybe time to find better answer...

Anyway, I noticed you mentioned Konig's lemma, but I could not see how to solve it using that lemma. I looked it up, and I think the proof, though, can help to show what you were asking for (I would be glad to know if there is a direct solution using Konig's lemma which requires bounded degree of the graph):

Denote by s a starting vertex, which is connected to all v_ij, where ij denotes a substring of the string we are trying to split into words. Also, put an edge between any v_ij and v_jk if they are both censored or uncensored. The graph is connected because of s. Choose a v_ij vertex. Remove it from the graph and its edges. There could be a few cases:
(a) We ended up with many connected components (possibly infinitely many) and there is also one infinite (in the number of vertices) connected component. Choose a vertex v_jk from that component (has to be one because we just removed v_ij) and apply the same procedure again.
(b) We ended up with many components (possibly infinitely many), and all of them are of finite size. For each such component n like that, set (j_n,J_n) to be the lowest and highest number such that there is a vertex v_i{J_n} in that component and a vertex v_{j_n}I (we can do that because each component is finite). Take a subsequence of this pairs such that J_k < j_{k+1} (possible because each component is finite). Now, all components "attached" with the same pair must have the same "type" (censored/uncensored), otherwise, their J_k could be extended (because there would be a way to go forward for one of them with either censored word or uncensored word.) Also, all components in general must have the same type, otherwise, for any substring which connects between J_k to j_m for J_k < j_m we could either make j_m smaller or J_k bigger. Now, clearly, all of those substrings between J_k to j_m have to be of the same type, otherwise, we could collapse two components of the same type into one component. This means we just found what we need, by taking all those substrings that connect between those components.

If we always do (a), then we found what we need too. Otherwise, at some point we have case (b), which will also lead to the result. This is inspired by the Konig's lemma.
I hope I got it right this time, but I would like to know if this is solvable using Konig's lemma with a bounded degree. I also hope my notation/explanation is not too cumbersome. If it really is complicated, let me know and maybe I could write it down more accurately.
Thanks!

By  Anonymous, at 11:30 PM

• Hey! Coach Factory Online simply wanted to say your website is one of the nicely laid out, most inspirational I have come across in quite a while.