Andy's Math/CS page

Wednesday, December 27, 2006

Fun with Censorship: the Finite Case

This post was inspired by Elad Verbin's insights and question about the 'Damn Good Puzzle'.  See that post's comments section for the buildup.

Here is a finite version of the censor puzzle (which readers should review for the scenario):

Given k > 0, there is an N (independent of the alphabet and allowed/forbidden word classification system), such that any text of length N has a parsing into k words, with all but possibly the first and last words sharing a classification.

Elad's comments suggested this theorem and implied the bound N <= R(k, k) + 1,
where R(k, k) is the kth Ramsey number.  But R(k, k) is super-exponential in k.  
In this post I will show how to get a quadratic upper bound.

Following Elad, given a text s_1, ... s_n of length n we define a graph coloring on the complete graph on vertices 0, 1, ... n.  Edge (i, j) with j > i is colored red if the word s_{i+1} ... s_j is allowed, otherwise, it's blue.  Now, if we can guarantee there exists a monochromatic path of length k in the graph, such that the path is ascending on vertex labels (call this an ascending path), it will naturally induce a parsing into at least k words (possibly k+1 or k+2, depending on whether vertices 0 and n get visited) that meets our condition.

Given an index i <= n and a 2-colored complete graph G on n+1 vertices, let R_i be the number of edges in the longest all-red ascending path in G that ends at vertex i.  
Similarly let B_i be the number of edges in the longest all-blue ascending path in G ending at vertex i.

Say i < j.  If (i, j) is colored red, then R_j >= R_i + 1, since any ascending red path ending at vertex i can be extended to vertex j via edge (i, j).

Similarly, if (i, j) is blue, then B_j >= B_i + 1.
In either case the (nonnegative) integer vector (R_j, B_j) does not equal (R_i, B_i).
There are only k^2 distinct nonnegative integer vectors having both terms at most k-1;
so if n >= k^2 + 1 some R_j or B_j must be at least k.  QED.

What I think is cool about this proof is that the measure of progress is very loose.  (R_j, B_j) doesn't strictly dominate (R_i, B_i), it just isn't equal, and the vectors just pile up until some vector--we can't predict which--necessarily exceeds our 'victory' bound.

The result may still be suboptimal for e.g. graph colorings induced by binary alphabets, since these are quite restricted.  Who knows, maybe there's a linear upper bound in this case.

If you liked this result, you should definitely check out the book Extremal Combinatorics by Stasys Jukna, which has all sorts of cool related stuff (and complexity theory applications, no less!  This is my second plug for the book, so take heed.).  So much stuff that I can't even rule out that it contains this result, perhaps as one of its many exercises--but my copy is in San Diego and I'm up in Berkeley on winter break.

It's been a year since I started this blog.  I intend to keep it rolling, and have several post ideas queued up, but I'd love to hear from you, dear reader, to get to know you and develop this site in conversation.

Addendum: I now realize that this result can also be derived from Dilworth's theorem (exercise, for those who care). While it doesn't seem to follow from Erdos-Szekeres (a special case of Dilworth), that theorem can be proved analogously to ours, so I probably read such a proof at one point. Nothing new under the sun...



  • Andy,

    I really liked your finite analogue of the censor problem, and your solution for it. It seems like _the_ finite analogue, or at least a pretty damn good finite analogue. And the proof is cool. It's quite surprising that k^2 characters are enough to be able to partition. It seems somewhat like the situation in the Erdős–Szekeres theorem.

    Here is a lower bound that proves that your bound k^2+1 is (almost) tight, even for binary alphabet. Specifically, for any k, I show a binary string s of length k^2-1, and a classification of all words to allowed/disallowed, such that there is no set of k consecutively-appearing subwords of s that are all allowed or all disallowed. As a special case, this implies that you can't partition s into k+2 words, such that all words are allowed or all words are disallowed, except the first and last.

    The classification is as follows: all words that contain at least one `1' are allowed. All words that consist only of 0s are disallowed. Here is the string s: s=(0^{k-1}1)^{k-1}0^{k-1}. In other words, s consists of k-1 `0's, then a `1', then k-1 `0's, then a `1', repeating this k-1 times, and then adding another k-1 `0's. s contains a total of k(k-1) `0's and k-1 `1's, so it has total size n=k^2-1. Now, if s had a set of k consecutively-appearing subwords that are all disallowed, then it would have at least k consecutive `0's, which is obviously a contradiction. On the other hand, if s had a set of k consecutively-appearing subwords that are all allowed, then s would have had k `1's, which is a contradiction. Thus, s has the property claimed.

    Note that this is one unit away from showing that Andy's bound is strictly tight. Andy proved that such an example is impossible to construct if n>=k^2+1. And in my example, n=k^2-1. Where did this one unit go, that's what I want to know. It got lost in the shuffle, I guess. :)

    Also, here's a generalization of Andy's theorem: Suppose that the great censor partitioned all words to m classes, and the petty censors wish to partition every word to k subwords, such that all subwords are in the same class, except maybe the first and the last subwords. I think Andy's proof generalizes to show that every word of length at least k^m+1 can be partitioned in this fashion, and I think the lower bound should extend as well, maybe with some lower-order inaccuracy, at least if you allow an alphabet of size m. I wonder how it works for smaller alphabets...


    By Anonymous Elad Verbin, at 7:40 PM  

  • Funny, you answered my question just before I asked it.

    By Anonymous elad, at 7:43 PM  

  • Elad--good example! That ties things up nicely (except your m-related question), at least until someone generalizes things in a new direction...
    And thanks again for setting the ball rolling on the finite case.

    An observation: testing for the longest monochrome-ascending walk is simple dynamic programming, while testing for cliques/ind. sets is NP-complete;

    then, worst-case examples for ascending walks are simple to describe (you just did), whereas describing worst-case Ramsey graphs seems very hard (although there are some positive results, e.g. see my post 'Using Randomness' on Razborov's work).

    So, is this a coincidence? Is there anything meaningful to be said about the relation between optimization problems and the associated problem in extremal combinatorics?

    I can give two examples that complicate the picture suggested by the current examples:

    i) again, CLIQUE is NP-complete, yet the edge-maximal k-clique-free graphs are simply describable (balanced n/(k-1)-partite graphs) via Turan's theorem;

    ii) It's easy to test in cubic time (less?) if there exist three points in a point set S in n-space that form an obtuse angle, yet for ages no one believed there were point sets of size c^d (for some c > 1) in d-dimensional space that contain no obtuse angles. Now we know there are, but I'm not sure if there are any deterministic constructions known. Certainly not 'simple' ones.

    By Blogger Andy D, at 9:17 PM  

  • by 'optimization problems', I really meant 'search problems' here.

    By Blogger Andy D, at 9:19 PM  

Post a Comment

<< Home