Fun with Censorship: the Finite Case
This post was inspired by Elad Verbin's insights and question about the 'Damn Good Puzzle'. See that post's comments section for the buildup.
Here is a finite version of the censor puzzle (which readers should review for the scenario):
Given k > 0, there is an N (independent of the alphabet and allowed/forbidden word classification system), such that any text of length N has a parsing into k words, with all but possibly the first and last words sharing a classification.
Elad's comments suggested this theorem and implied the bound N <= R(k, k) + 1,
where R(k, k) is the kth Ramsey number. But R(k, k) is super-exponential in k.
In this post I will show how to get a quadratic upper bound.
Following Elad, given a text s_1, ... s_n of length n we define a graph coloring on the complete graph on vertices 0, 1, ... n. Edge (i, j) with j > i is colored red if the word s_{i+1} ... s_j is allowed, otherwise, it's blue. Now, if we can guarantee there exists a monochromatic path of length k in the graph, such that the path is ascending on vertex labels (call this an ascending path), it will naturally induce a parsing into at least k words (possibly k+1 or k+2, depending on whether vertices 0 and n get visited) that meets our condition.
Given an index i <= n and a 2-colored complete graph G on n+1 vertices, let R_i be the number of edges in the longest all-red ascending path in G that ends at vertex i.
Similarly let B_i be the number of edges in the longest all-blue ascending path in G ending at vertex i.
Say i < j. If (i, j) is colored red, then R_j >= R_i + 1, since any ascending red path ending at vertex i can be extended to vertex j via edge (i, j).
Similarly, if (i, j) is blue, then B_j >= B_i + 1.
In either case the (nonnegative) integer vector (R_j, B_j) does not equal (R_i, B_i).
There are only k^2 distinct nonnegative integer vectors having both terms at most k-1;
so if n >= k^2 + 1 some R_j or B_j must be at least k. QED.
What I think is cool about this proof is that the measure of progress is very loose. (R_j, B_j) doesn't strictly dominate (R_i, B_i), it just isn't equal, and the vectors just pile up until some vector--we can't predict which--necessarily exceeds our 'victory' bound.
The result may still be suboptimal for e.g. graph colorings induced by binary alphabets, since these are quite restricted. Who knows, maybe there's a linear upper bound in this case.
If you liked this result, you should definitely check out the book Extremal Combinatorics by Stasys Jukna, which has all sorts of cool related stuff (and complexity theory applications, no less! This is my second plug for the book, so take heed.). So much stuff that I can't even rule out that it contains this result, perhaps as one of its many exercises--but my copy is in San Diego and I'm up in Berkeley on winter break.
It's been a year since I started this blog. I intend to keep it rolling, and have several post ideas queued up, but I'd love to hear from you, dear reader, to get to know you and develop this site in conversation.
Addendum: I now realize that this result can also be derived from Dilworth's theorem (exercise, for those who care). While it doesn't seem to follow from Erdos-Szekeres (a special case of Dilworth), that theorem can be proved analogously to ours, so I probably read such a proof at one point. Nothing new under the sun...
Here is a finite version of the censor puzzle (which readers should review for the scenario):
Given k > 0, there is an N (independent of the alphabet and allowed/forbidden word classification system), such that any text of length N has a parsing into k words, with all but possibly the first and last words sharing a classification.
Elad's comments suggested this theorem and implied the bound N <= R(k, k) + 1,
where R(k, k) is the kth Ramsey number. But R(k, k) is super-exponential in k.
In this post I will show how to get a quadratic upper bound.
Following Elad, given a text s_1, ... s_n of length n we define a graph coloring on the complete graph on vertices 0, 1, ... n. Edge (i, j) with j > i is colored red if the word s_{i+1} ... s_j is allowed, otherwise, it's blue. Now, if we can guarantee there exists a monochromatic path of length k in the graph, such that the path is ascending on vertex labels (call this an ascending path), it will naturally induce a parsing into at least k words (possibly k+1 or k+2, depending on whether vertices 0 and n get visited) that meets our condition.
Given an index i <= n and a 2-colored complete graph G on n+1 vertices, let R_i be the number of edges in the longest all-red ascending path in G that ends at vertex i.
Similarly let B_i be the number of edges in the longest all-blue ascending path in G ending at vertex i.
Say i < j. If (i, j) is colored red, then R_j >= R_i + 1, since any ascending red path ending at vertex i can be extended to vertex j via edge (i, j).
Similarly, if (i, j) is blue, then B_j >= B_i + 1.
In either case the (nonnegative) integer vector (R_j, B_j) does not equal (R_i, B_i).
There are only k^2 distinct nonnegative integer vectors having both terms at most k-1;
so if n >= k^2 + 1 some R_j or B_j must be at least k. QED.
What I think is cool about this proof is that the measure of progress is very loose. (R_j, B_j) doesn't strictly dominate (R_i, B_i), it just isn't equal, and the vectors just pile up until some vector--we can't predict which--necessarily exceeds our 'victory' bound.
The result may still be suboptimal for e.g. graph colorings induced by binary alphabets, since these are quite restricted. Who knows, maybe there's a linear upper bound in this case.
If you liked this result, you should definitely check out the book Extremal Combinatorics by Stasys Jukna, which has all sorts of cool related stuff (and complexity theory applications, no less! This is my second plug for the book, so take heed.). So much stuff that I can't even rule out that it contains this result, perhaps as one of its many exercises--but my copy is in San Diego and I'm up in Berkeley on winter break.
It's been a year since I started this blog. I intend to keep it rolling, and have several post ideas queued up, but I'd love to hear from you, dear reader, to get to know you and develop this site in conversation.
Addendum: I now realize that this result can also be derived from Dilworth's theorem (exercise, for those who care). While it doesn't seem to follow from Erdos-Szekeres (a special case of Dilworth), that theorem can be proved analogously to ours, so I probably read such a proof at one point. Nothing new under the sun...
Labels: general math