Andy's Math/CS page

Wednesday, July 21, 2010

Injective polynomials

From a paper of MIT's Bjorn Poonen, I learned of an amazingly simple open problem. I'll just quote the paper (here Q denotes the rational numbers):

"Harvey Friedman asked whether there exists a polynomial f(x, y) in Q(x, y) such that the induced map Q x Q --> Q is injective. Heuristics suggest that most sufficiently complicated polynomials should do the trick. Don Zagier has speculated that a polynomial as simple as x^7 + 3y^7 might be an example. But it seems very difficult to prove that any polynomial works. Both Friedman's question and Zagier's speculation are at least a decade old... but it seems that there has been essentially no progress on the question so far."

Poonen shows that a certain other, more widely-studied hypothesis implies that such a polynomial exists. Of course, such a polynomial does not exist if we replace Q by R, the reals. In fact any injection R x R --> R must be (very) discontinuous.

Suppose an injective polynomial f could be identified, answering Friedman's question; it might then be interesting to look at `recovery procedures' to produce x, y given f(x, y). We can't hope for x, y to be determined as polynomials in f(x, y), but maybe an explicit, fast-converging power series or some similar recipe could be found.

Finally, all of this should be compared with the study of injective polynomials from the Euclidean plane to itself; this is the subject of the famous Jacobian Conjecture. See Dick Lipton's excellent post for more information.

Labels:

Monday, August 10, 2009

Wit and Wisdom from Kolmogorov

I just learned about a result of Kolmogorov from the '50s that ought to interest TCS fans. Consider circuits operating on real-variable inputs, built from the following gates:

-sum gates, of unbounded fanin;
-arbitrary continuous functions of a single variable.

How expressive are such circuits? Kolmogorov informs us that they can compute any continuous function on n variables. Wow! In fact, one can achieve this with poly(n)-sized, constant-depth formulas alternating between sums and univariate continuous functions (two layers of each type, with the final output being a sum).

Note that sums can also be computed in a tree of fanin-two sums, so this theorem (whose proof I've not yet seen) tells us that fanin-two continuous operations capture continuous functions in the same way that fanin-two Boolean operations capture Boolean computation (actually, even more strongly in light of the poly-size aspect).

This result (strengthening earlier forms found by Kolmogorov and by his student Arnol'd) may, or may not, negatively answer one of Hilbert's problems, the thirteenth--it's not entirely clear even to the experts what Hilbert had intended to ask. But a fun source to learn about this material is a survey/memoir written by A. G. Vitushkin, a former student of Kolmogorov. [a gated document, unfortunately...]

The freewheeling article starts off with Hilbert's problem, but also contains interesting anecdotes about Kolmogorov and the academic scene he presided over in Moscow. Towards the end we get some amusing partisan invective against Claude Shannon, who, scandalously, "entirely stopped his research at an early stage and still kept his position of professor at the Massachusetts Institute of Technology". Vitushkin (who passed away in 2004) apparently bore a grudge over the insufficient recognition of Soviet contributions to information theory. My favorite story relates a not-so-successful visit by Shannon to meet Kolmogorov and features a deft mathematical put-down:

"Kolmogorov was fluent in French and German and read English well, but his spoken English was not very good. Shannon, with some sympathy, expressed his regret that they could not understand each other well. Kolmogorov replied that there were five international languages, he could speak three of them, and, if his interlocutor were also able to speak three languages, then they would have no problems."

Labels:

Wednesday, May 20, 2009

Additive combinatorics: a request for information

Given a subset A of the integers mod N, we can ask, how many 4-element `patterns' appear in A? A pattern is an equivalence class of size-4 subsets of A, where two 4-sets S, S' are considered the same pattern if S = S' + j (mod N) for some j.

Clearly the number of patterns is at most |A|-choose-4; but it can be much less: if A is a consecutive block, or more generally an arithmetic progression, the number of patterns is on the order of |A|^3.

So my question is: if the number of patterns is `much less' then |A|^4, what nice structure do we necessarily find in A?

I believe that similar questions for 2-patterns have satisfactory answers: then the hypothesis is just that the difference-set (A - A) is small. In this case I believe A is 'close' to a (generalized) arithmetic progression, although actually I'm having trouble finding the relevant theorem here too (most references focus on sumsets (A + A), for which Frieman's Theorem applies).

Thanks in advance for any pointers!

Labels:

Tuesday, March 24, 2009

Algebraic and Transcendental

A number is called algebraic if it is the root of a nonzero polynomial with integral coefficients (or, equivalently, rational coefficients); otherwise it is transcendental.

Item:
there exists a nonintegral c > 1 such that c^n 'converges to the integers': that is, for any eps > 0 there's an N > 0 such that c^n is within eps of some integer, for all n > N. (I think the golden mean was an example, but I can't find or remember the reference book at the moment.)

Item: it is apparently open whether any such number can be transcendental.

***

Next up, two questions of my own on algebraic numbers. Possibly easy, possibly silly, but I don't know the answers.

Define the interleave of two numbers

b =
0.b_1b_2b_3... and c = 0.c_1c_2c_3...
(both in [0, 1], and using the 'correct' binary expansions) as

b@c = 0.b_1c_1b_2c_2...

1) Suppose b, c are algebraic. Must b@c be algebraic?

2) The other direction. Suppose b@c is algebraic. Must b, c be algebraic?

In both cases, I am inclined towards doubt.

***

Final thoughts (and these are observations others have made as well)... computational complexity theory seems to have certain things in common with transcendental number theory:
-an interest in impossibility/inexpressibility results;

-markedly slow progress on seemingly basic
questions--is (pi + e) irrational? does P = NP?

-attempts to use statistical and otherwise 'constructive' properties as sieves to distinguish simple objects from complicated ones.

So, could complexity theorists benefit from interacting and learning from transcendental number theorists? If nothing else, looking back on their long historical march might teach us patience and an appreciation for incremental progress.

Labels: ,

Friday, October 31, 2008

Excitement and Probability

Elections, sporting events, and other competitions can be exciting. But there is also a sense in which they are almost always dull, and this can be proved rigorously. Allow me to explain.

(What follows is an idea I hatched at UCSD and described to Russell Impagliazzo, who had, it turned out, discovered it earlier with some collaborators, for very different reasons. I wouldn't be surprised if it had been observed by others as well. The proof given here is similar to my original one, but closer to the more elegant one Russell showed me, and I'll cite the paper when I find it.)

I want to suggest that a competition is dull to watch if one side is always heavily favored to win (regardless of whether it's the side we support), or if the two sides muddle along in a dead heat until some single deciding event happens. More generally, I'd like to suggest that excitement occurs only when there is a shift in our subjective probabilities of the two (let's say just two) outcomes.

Without defending this suggestion, I'll build a model around it. If anyone is upset that this notion of excitement doesn't include the smell of popcorn, the seventh-inning stretch, or any substantive modelling of the competition itself, there's no need to read any further.

Assume we are a rational Bayesian spectator watching a competition unfold, and that we receive a regular, discrete sequence of update information \(X_1, X_2, ... X_n\) about a competition (these update variables could be bits, real numbers, etc.). Let $p_0, p_1, ... p_n$ be our subjective probabilities of 'victory' (fixing an outcome we prefer between the two) at each stage, where $p_t$ is a random variable conditioning on all the information $X_1, ... X_t$ we've received at time $t$.

For $t > 0$, let's define the excitement at time $t$ as $EXC_t = |p_{t} - p_{t - 1}|$. This random variable measures the 'jolt' we presume we'll get by the revision of our subjective probabilities on the $t$th update.

Define the total excitement $EXC$ as the sum of all $EXC_t$ 's. Now, we only want to watch this competition in the first place if the expected total excitement is high; so it's natural to ask, how high can it be?

We needn't assume that our method for updating our subjective probability corresponds to the 'true' probabilities implied by the best possible understanding of the data. But let's assume it conforms at least internally to Bayesian norms: in particular, we should have $E[p_{t + 1} | p_{t} = p] = p$.

An immediate corollary of this assumption, which will be useful, is that
\[E[p_t p_{t + 1}] = \sum_p Prob[p_t = p]\cdot p E[p_{t + 1}|p_t = p]\]
\[= \sum_p Prob[p_t = p]\cdot p^2 = E[p_t^2].\]

OK, now rather than look at the expected total excitement, let's look at the expected sum of squared excitements, an often-useful trick which allows us to get rid of those annoying absolute value signs:
\[E[EXC_1^2 +EXC_2^2 + \ldots + EXC_{n }^2]\]
\[= E[(p_1 - p_0)^2 + \ldots + (p_n - p_{n - 1})^2 ] \]
\[= E[p_0^2 + p_n^2 + 2\left(p_1^2 + \ldots + p_{n - 1}^2\right) \]
\[ \quad{}- 2\left(p_1 p_0 + p_2 p_1 + \ldots + p_n p_{n - 1} \right) ] \]

\[= E[p_0^2] + E[p_n^2] + 2(E[p_1^2] + \ldots + E[p_{n - 1}^2)] \]
\[ \quad{} - 2(E[p_0^2] + \ldots + E[p_{n - 1}^2] )\]
(using linearity of expectation and our previous corollary). Now we get a bunch of cancellation, leaving us with
\[= E[p_n^2] - E[p_0^2].\]
This is at most 1. So if we measured excitement at each step by squaring the shift in subjective probabilities, we'd only expect a constant amount of excitement, no matter how long the game!

Now, rather crudely, if $Y \geq 0$ and $E[Y] \leq 1$ then $E[\sqrt{Y}] \leq 2$. We also have the general '$\ell_1$ vs $\ell_2$' inequality
\[|Y_1| + ... + |Y_n| \leq \sqrt{ n \cdot (Y_1^2 + ... + Y_n^2)} .\]
Using both of these, we conclude that
\[E[EXC_1 + \ldots + EXC_n] \leq E\left[\sqrt{n \cdot \left(EXC_1^2 + \ldots + EXC_n^2\right)} \right] \leq 2 \cdot \sqrt{n} .\]
Thus we expect at most $2 \sqrt{n}$ total excitement, for an expected 'amortized excitement' of at most $2/\sqrt{n} = o(1)$.

Watch $n$ innings, for only $O(\sqrt{n})$ excitement? Give me break! If $n$ is large, it's better to stay home--no matter what the game.

I would love to see this theory tested against prediction markets like Intrade, which are argued to give a running snapshot of our collective subjective probability of various events. Are the histories as 'low-excitement' as our argument predicts? Even lower? (Nothing we've said rules it out, although one can exhibit simple games which have expected excitement on the order of $\sqrt{n}$.)

And if the histories exhibit more excitement than we'd predict (some sign of collective irrationality, perhaps), is there a systematic way to take advantage of this in the associated betting market? Food for thought. I'd be grateful if anyone knew where to get a record of Intrade's raw day-by-day numbers.

Finally, nothing said above rules out individual games playing out with high excitement, on the order of $n$, but it does say that such outcomes should be infrequent. I believe a more careful martingale approach would show an exponentially small possibility of such large deviations (Russell said their original proof used Azuma's inequality, which would probably suffice).

Labels: ,

Friday, May 23, 2008

A Must-See Site

It's called Theorem of the Day. I just found it, so I can't judge how accurate the frequency claim is, but Robin Whitty has stacked up an impressive collection of short, illustrated introductions to major theorems. I like his taste... two that were news to me and I especially liked in my first sampling: Nevanlinna's Five-Value Theorem and The Analyst's Traveling Salesman Problem.

Sign Robin's guestbook, spread the word, and recommend a theorem of CS theory! (I see at least 3 already, if you count unsolvability of Diophantine equations.)

Labels:

Thursday, May 15, 2008

Random bits

This morning I was delighted to learn that Alan Baker, the Swarthmore professor whose Philosophy of Science class I took senior year, recently won the US Shogi championships--read his account here. Congrats, Prof. Baker!
I never learned Shogi, although Go--another Asian board game--was a big part of my youth and probably a decisive factor in my eventual interest in math/CS.

In other news, I am coauthor on a paper, 'The Power of Unentanglement', recently released on ECCC (and headed for CCC '08), jointly with Scott Aaronson, Salman Beigi, Bill Fefferman, and Peter Shor. It's about multiple-prover, single-round, quantum interactive proofs, and I encourage those whose cup of tea that is to take a look.

This is my first appearance in a conference paper, but not quite my first time in print--it happened once before, by accident. As a freshman in college, I asked my Linear Algebra professor, Steven Maurer, a question on an online class discussion board. Here it is, in the breathless and overwrought prose of my teenage years:

"This question has been haunting me, and I know I shouldn't expect definite answers. But how do mathematicians know when a theory is more or less done? Is it when they've reached a systematic classification theorem or a computational method for the objects they were looking for? Do they typically begin with ambitions as to the capabilities they'd like to achieve? I suppose there's nuanced interaction here, for instance, in seeking theoretical comprehension of vector spaces we find that these spaces can be characterized by possibly finite 'basis' sets. Does this lead us to want to construct algorithmically these new ensembles whose existence we weren't aware of to begin with? Or, pessimistically, do the results just start petering out, either because the 'interesting' ones are exhausted or because as we push out into theorem-space it becomes too wild and wooly to reward our efforts? Are there more compelling things to discover about vector spaces in general, or do we need to start scrutinizing specific vector spaces for neat quirks--or introduce additional structure into our axioms (or definitions): dot products, angles, magnitudes, etc.?

Also, how strong or detailed is the typical mathematician's sense of the openness or settledness of the various theories? And is there an alternative hypothesis I'm missing? "


Steve gave a thoughtful reply, years passed, and then a similar topic came up in coversation between himself and the mathematician and popular math writer Philip J. Davis. The conversation sparked an essay by Davis in which he quoted Maurer and I (with permission), the essay recently became part of a philosophy-of-math book ('Mathematics and Common Sense: A Case of Creative Tension'), and I got mailed a free copy--sweet! Once again, I recommend the book to anyone who enjoys that kind of thing. The essay is online.

Labels:

Friday, April 04, 2008

Some babies grow in a peculiar way

Today I bumped into an order of growth I hadn't seen before, and thought I'd share it for a modest bit of mental aerobics.

Readers may well have seen functions of form

f(n) = (log n)^c,

known as 'polylogarithmic'. These are important in, e.g., query complexity, where we like the number of queries to be much smaller than the input size n when possible. Also emerging from such studies are the 'quasipolynomial' functions, of form

g(n) = 2^{(log n)^c}.

As a warmup--how fast do these things grow?


OK, now the main course tonight is the following:

h(n) = 2^{2^{(log log n)^c}}.

What do you make of these? And what questions need answering before we understand such a growth rate 'well enough'? I'm unsure and would love to hear your thoughts.

Labels:

Friday, March 21, 2008

News, and some Number Theory

Time for a personal update: I'm enjoying myself greatly here in Cambridge, and was recently admitted as a transfer student to MIT's EECS department. The move has both personal and academic advantages for me, but let me emphasize that I think UCSD has a lot to offer prospective students, and in addition is home to many cool people whom I miss... I would be happy to offer advice about either school.

When I started writing here as an undergrad, I knew very few people in the worlds of TCS and Complexity, and the blog was a way of reaching out to a community I wanted to join. Now, thanks in part to blogging (thru which I met Scott, my advisor), I'm immersed in that community at a school with a large, vibrant Theory group. This is to say that, though writing here still interests me, it no longer answers an urgent personal need, and I am most likely to post new material in response to a request for specific topics or post types.


Today I was perusing a favorite book of mine, 'Gems of Theoretical Computer Science' by U. Schoning & R. Pruim. One chapter partially treats the undecidability of determining whether systems of Diophantine equations (polynomial equations where solutions are required to be integral) have a solution. This was one of Hilbert's problems from 1900, solved in 1971 and full of deep math.

The authors pose the following exercise: show that the version where variables must take nonnegative values, is reducible to the integer version. (And vice-versa; also the rational-values version is reducible to the integral version; the converse appears to be open...)

Think about it...



The reduction they give: Given a set of polynomial equations {P_i(X_1, ... X_k)}, we want to determine if they're simultaneously satisfiable with nonnegative values. Introduce, for each j <= k, the variables Y_{j, 1}, Y_{j, 2}, Y_{j, 3}, Y_{j_4}.

Now add to the system, for each j <= k, the constraint that

X_j = Y_{j, 1}^2 + Y_{j, 2}^2 + Y_{j, 3}^2 + Y_{j_4}^2.

Claim: the new system is satisfiable over the integers iff the original one is satisfiable over the nonnegative integers! The proof, of course, is an easy consequence of Lagrange's Theorem that every nonnegative integer is the sum of 4 integer squares.

So, my question is, could a reduction be found that doesn't rely on Lagrange's Theorem? Or its weaker variants where the constant 4 is replaced with some constant c > 4. Or maybe for some constant c, the proof is really so simple that I will be satisfied that we are performing this reduction with the cheapest tools.

If our plan in the reduction is to constrain the original variables in a form analogous to the above, namely

X_j = Q_j(Y, Z, ...),

where Y, Z, ... are new integral variables, is there any way around proving a version of Lagrange's Theorem? Generally we show that polynomials are identically nonnegative by expressing them as a sum of one or more squares, e.g.,

Y^2 + Z^2 - 2YZ = (Y - Z)^2.

Using this template, we'd be reliant on Lagrange. However, in his thesis, Minkowski conjectured that there exist nonnegative real polynomials *not* so expressible, and this was proved by Hilbert. A simple example I found online is

Q(X, Y, Z) = X^4*Y^2 + Y^4*Z^2 + Z^4*Y^2 - 3X^2*Y^2*Z^2,

and another counterexample is derived systematically in the classy and useful inequalities book 'The Cauchy-Schwartz Master Class' by J.M. Steele. (Artin proved, however, that every poly that's identically nonnegative, is expressible as the sum of finitely many *rational* functions squared.)

Could it be that one of these creatures is easier to use in the reduction (both to prove it's nonnegative, and ranges over all nonnegative values)? Somehow I doubt it. Anyways, I just wanted to point to an instance where a reduction one would expect to be straightforward seems to require fairly nontrivial understanding of the problem domain.

Labels: ,

Monday, December 24, 2007

That's a Wrap

I had some gift-giving duties to attend to today... halfway into the wrapping phase, my mind wandered in a predictable direction:

Given a set of rectangular box dimensions, what is the smallest amount of paper that will wrap the box?

One would assume we want to cut out a rectangle of wrapping paper, since otherwise we get annoying scraps (and also the problem is mathematically trivial if we can cut a cross-type shape).

I haven't had any time to toy with this problem, but I had a feeling one particular MIT dude might have... and I was right. True to form, Erik Demaine delivers a whole page about various wrapping problems he's explored with colleagues.

To anyone reading who teaches a programming course, I would suggest that a fun algorithms assignment could be spun out problems of the above type. If the program outputted folding instructions too, so much the better.

Happy holidays, all!

Labels: ,

Tuesday, October 09, 2007

Random Freaky Facts

While it may not be at all apparent, I've been somewhat particular about the types of posts I make to this blog. I've realized, however, that there should be no real harm in loosening up, and it'd probably increase my post frequency. So today, let me report two theorems that caught my eye in recent weeks. If I had to find a common theme, it'd be: finding interesting constraints in seemingly very unconstrained settings.

1) Let G be an infinite (undirected, simple) graph. Let the 'density' d(H) of a finite subgraph H of G be the fraction of pairs of vertices (u, v) from H that are connected by an edge. Let the 'upper density' d*(G) of G be the greatest number p such that we can find arbitrarily large finite subgraphs H, with d(H) as close as desired to p.

(i.e. for all n, epsilon there exists an H with |H| > n, d(H) > p - epsilon.)

Then, bizarrely, d*(H) must be one of the following numbers: {0, 1, 1/2, 2/3, 3/4, 4/5, 5/6, ...}

This result follows from the 'Erdos-Stone Structure Theory', generalizing Turan's Theorem in extremal graph theory. It is presented, in context, in Bollobas' excellent textbook 'Modern Graph Theory'. Basically, the idea is that, if H is large enough and of density greater than, e.g., 5/6 + epsilon, it must contain a particular subgraph H', of density 6/7.


2) Let X be a probability distribution over R^d. Choose d+1 points independently, each distributed as X. Then the probability p that their convex hull contains the origin, cannot exceed (1/2)^d.

Wild!

Unfortunately, I forget where I saw this last one, or who it's due to, but if I find out I'll attribute it here. I'm, uh, 90% sure I've got the numbers in the statement correct.

Labels:

Sunday, September 16, 2007

Information is not a Substance

I wanted to share a cool idea from the young field of 'network coding', which explores techniques for efficiently disseminating information over networks like the web. I believe it originated in this seminal paper, which I learned about indirectly by reading Michael Mitzenmacher's very interesting research blog My Biased Coin (which reflects his professional activity in network coding and is a good source; ref. #7 on this link is a good first article to read).

Consider the following directed network:



Here, each of the three top nodes have a stream of data they need to communicate to the node of corresponding color on the bottom. Each edge shown can transmit a bit per second in the direction indicated, and the purple nodes in the middle are willing to assist in communication.

The perverse structure of this network seems to pose a problem: each node on top has two pink edges which are headed towards the wrong parties on the bottom. Since bottom nodes cannot themselves communicate, the pink edges appear useless. It would seem that all information flow must happen thru the black edges, whose bottleneck would allow only one party to get a bit across each second.

WRONG! (puzzle: stop reading and figure out a better strategy.)


Here's the idea: Say the top nodes want to send bits x, y, z respectively on a round. They send their bit on their every available edge. The top purple node receives all of them and computes their XOR, that is, x + y + z (mod 2). It sends it to bottom-purple node, which then broadcasts it to the three receivers.

Then, for instance, bottom-red receives y, z, and x + y + z, so it can add up all the bits it receives to get y + z + (x + y + z) = x, just what it wanted. Similarly for the other nodes. Repeating this process, the network transmits at three times the rate we were led to expect by our faulty initial reasoning (a line of thought which restricted us to the best so-called 'multicommodity flow' on the network).

If these were roads, and we were sending formed substances to their destinations, our first approach would be valid. But information is not a substance--it obeys zany laws of its own, and can at times be merged, fractionalized, and delocalized in unexpected and useful ways (often involving that mischievous XOR). Better understanding the information capacity of networks more complex than the one above is a fascinating and important research frontier--so keep following Mitzenmacher's blog!

Labels:

Friday, September 14, 2007

Be Rational

Today, enjoy a home-baked puzzle in real analysis--a somewhat old-fashioned subject that I can't help but love (much like automata theory).

We know that pi = 3.1415... is transcendental, i.e., it is not a zero of any nontrivial univariate polynomial with rational coefficients. Is it possible we could generalize this result?


Part I. Is there a continuous function f: R --> R, such that

a) f takes rational numbers to rational numbers,

b) f(x) is zero if and only if x = pi?


Part II. If you think no such f exists, then for which values (in place of pi) does such an f exist?
On the other hand, if f as in Part I does exist (I reveal nothing!), can it be made infinitely differentiable?


For those new to real analysis, 'weird' objects like the desired f above generally have to be created gradually, through an iterative process in which we take care of different requirements at different stages. We try to argue that 'in the limit', the object we converge to has the desired properties. (Does this sound like computability theory? The two should be studied together as they have strong kinships, notably the link between diagonalization/finite extension methods and the Baire Category Theorem.)

Passing to the limit can be subtler than it first appears; for example, a pointwise limit of continuous functions need not be continuous. Here is a pretty good fast online introduction to some central ideas in studying limits of functions.

Labels: ,

Monday, August 27, 2007

Loopy Thinking

Earlier I posted on the 'backwards orientation' in mathematics, aimed at finding new characterizations or 'origin stories' for familiar objects. Here is an exercise in this kind of math, hopefully amusing.

We'll consider the class of simple, rectifiable closed curves in the plane, that is, non-self-intersecting continuous 'loops' in R^2 to whose segments a definite finite arc-length can always be assigned.

Given two points x, y on such a curve C, let d_C(x, y) denote the distance along the curve from x to y (in whichever direction is shorter). On the other hand, let d(x, y) denote regular Euclidean distance.

We say that a curve respects distances if, for all u, v, x, y on C, we have

d_C(u, v) < d_C(x, y) if and only if d(u, v) < d(x, y).

Now, a circle respects distances in this sense; do any other curves in our class?

Would your answer change if we used a weaker notion of respecting distances? Namely, for all x, y, z in C,

d_C(x, y) < d_C(x, z) iff d(x, y) < d(x, z).


Bonus problem: the analogous question for rectifiable-path-connected, closed plane sets with nonempty interior. Convex bodies respect distances; do any others? (Recall that we previously discussed a very different kind of characterization of convexity.)
It may help to know that between any two points in such a set, there exists a path of minimum length (see Kolmogorov & Fomin's book on functional analysis, Sect. 20, Thm 3.).

Labels: , ,

Wednesday, August 08, 2007

Another Public Service Announcement

We've all heard that randomness and probability are becoming more and more important in computer science. A number of good references exist for learning the basic techniques of probabilistic analysis. Many of these techniques are radically simple and incredibly powerful, even for solving problems where probability does not seem to play a role (such as design or optimization problems where what is wanted is a highly 'uniform' or 'balanced' structure, e.g. in exhibiting Ramsey graphs, expanders/extractors, low-discrepancy sets, etc.). As such they are a very good investment. One can go very far, for instance, granted only familiarity with the meaning and some applications of the following simple concepts:

-linearity of expectation

-the second moment (a.k.a. variance) method

-union bounds

-Chernoff-style deviation bounds

As I've said in the past, The Probabilistic Method by Alon and Spencer is the most delightful way to develop an appreciation for this stuff. I have blogged before on some of these ideas, and will happily do so again, but today I want to discuss an aspect of probabilistic analysis that receives somewhat less thorough discussion in the references I've consulted.

Namely: given two random variables X, Y, how should we quantify their similarity or difference? As an example of a case where such a question might be important, suppose we have a randomized algorithm that correctly solves some problem with high probability if given access to fair coin flips, but we don't have a 'perfect' source of random bits; we want to find conditions under which an imperfect or pseudorandom source will be 'close enough' to the uniform distribution to play the role of the algorithm's random bits, without destroying its correctness.

Is there one 'right' notion of similarity, or do we need multiple notions depending on application? The answer is, 'both'. There is a very widely useful quantity, called the 'statistical distance', which is effective in many cases, but to recognize these cases it is important to have several perspectives on the matter.

I will present the definition and three variants. I encourage readers to submit any others, and also to point me to official nomenclature and references for the variants (I'm working from memory).

Let X, Y be random variables. Let's say they take on (real-number) values from the finite set {a_1, a_2, ... a_n}, just to keep the discussion nice and safe. Suppose X, Y take on value a_i with probabilities p_i, q_i respectively.

The 'statistical distance' d(X, Y) is then defined as

d(X, Y) = .5 * (Sum from i=1 to i= n: |p_i - q_i|).

(That's an absolute value in there.)

Also denoted ||X - Y||, the statistical distance should not be confused with the L_1 distance, which is just twice the statistical distance when applied to probability vectors.

Readers should verify the following: d(X, Y) is a metric, taking values between 0 and 1. d(X, Y) = 0 if and only if X, Y are identically distributed, and d(X, Y) = 1 if and only if X, Y never take on the same values.

Now consider the three following questions we could ask about X and Y:


1) Say we want to build a 'distinguisher' to tell apart the two variables: that is, a function A: {a_1, ... a_n} --> {0, 1} that maximizes the quantity
|P[A(X) = 1] - P[A(Y) = 1]|. (I believe this is the 'distinguishing probability' of A, but don't quote me.) Call Q1(X, Y) the maximum quantity achievable in this way.

(Note that we could allow A to use additional randomness; it wouldn't affect the value Q1. This is important in applications.)


2) Now say a friend secretly chooses one of the two variables X or Y, each with probability 1/2. She then takes a random sample from the chosen variable and presents it to us. We are asked to guess which sample it came from, and we wish to find a rule that'll maximize our probability of guessing right.
Let Q2(X, Y) denote the maximum achievable probablity of correctness (which again is unaffected by the use of randomness in guessing).


3) Finally, say we wish to design a random variable Z = (Z_1, Z_2), outputting a pair of real values. We require that Z_1 be distributed identically to X
(i.e. d(X, Z_1) = 0), and that Z_2 be distributed identically to Y, but we do not require independence. Our goal is design the pair (Z_1, Z_2) (called a 'coupling' of X and Y) to maximize the probability that Z_1 = Z_2. Denote by Q3(X, Y) this maximum achievable probability.


Now, it is a pleasant fact that all of Q1, Q2, Q3 are completely determined by the value d(X, Y), and conversely, they also each determine d(X, Y) (and each other). Moreover, the quantities are all linearly related, so if you forget the precise relationships (as I invariably do, which motivated this post), they're easy to rederive by considering the extreme cases and interpolating. For the record, here are the relationships:

Q1(X, Y) = d(X, Y)

Q2(X, Y) = .5 + .5 * d(X, Y)

Q3(X, Y) = 1 - d(X, Y)

I recommend verifying these facts once or twice and then taking them for granted. Note that the first two (of which Q1 is more frequently used) are measures of distance, while Q3 is a measure of similarity. Both perspectives come in handy.

Note too that, while d(X, Y) is defined by an equation, Q1-Q3 are defined by an ability to do something; Q1-Q2 in particular are framed around an agent's ability to distinguish between cases, and it is natural that computer scientists often prefer to take such a view.

In future posts, I hope to elaborate somewhat on these perspectives (couplings in particular, which are probably the most interesting).

Labels: ,

Sunday, July 15, 2007

A Tricky Vote

An election is being held to select one of two candidates, A or B. n voters turn out in a long, orderly line. A motley crew of pollsters show up to predict the turnout, but none is diligent enough to survey the entire line, or savvy enough to conduct a proper statistical sample. Instead, each pollster arbitrarily selects some single interval of the line and queries everyone in that interval. The intervals may be of various sizes, and they may overlap.

After comparing their haphazard data, the pollsters notice that candidate A, the presumptive favorite, has more than two-thirds support in every survey interval.

Assume the process is 'nice' in the following senses:

-Everyone truthfully reveals their intentions to the pollsters, and intentions don't change;
-Everyone in line eventually votes (once) for their intended candidate, and votes are faithfully counted;
-No one changes position in the line.

Also assume that everyone is surveyed at least once.

Can you conclude that candidate A is the winner?


Note that 2/3 is the lowest threshold one could hope for: for consider 4 voters, with prefereces going B A A B, and with 2 survey intervals consisting of the first three and last three voters respectively; then A has exactly 2/3 support in each, yet the outcome is a tie.

If this problem gives you trouble, try solving it first with 'two thirds' replaced by 'ninety-nine percent'.

I enjoy questions like this one, that try to deduce 'global' conditions from a collection of 'local' observations. With the advent of 'property testing' in computer science, they have become increasingly important to the field. I hope to make further posts on the subject soon, and possibly also discuss generalizations of the above problem (Does anyone know if/where it's been studied before?)

Update, 9/27: I've since learned that this problem falls in well-trod territory of functional and harmonic analysis; specifically, the (affirmative) answer to this problem essentially states that a certain 'maximal operator' over a function space is 'bounded'. The solution techniques described by David and myself in the comments are standard and fall under the rubric of 'covering theorems', the most basic one being due to Vitali. I'll try to blog more soon on what I've learned about this problem, which gets very interesting in 2 dimensions...

Labels: ,

Monday, July 09, 2007

Guardian Angel Decoding

In the conventional setting of error-correcting codes (ECCs), there are three parties: Sender, Receiver, and Adversary. Sender wants to send a message m to Receiver; problem is, Adversary may intercept the message string and modify as many as e > 0 symbols before Receiver gets the message.

The idea of an error-correcting code is to encode the message m in a redundant fashion as some 'codeword' C(m), so that even after as many as e errors are introduced, the original message m is uniquely recoverable.

For a simple example (not representative of the sophistication of more powerful codes), say Sender wants to transmit a single bit of information, and e = 1. Then we could have C(0) = 000, C(1) = 111, and Receiver can correctly recover the message bit by taking the majority vote of the 3 bits received.

For a much more comprehensive introduction of ECCs from a TCS perspective, see e.g. this survey by Sudan, or this one by Trevisan.


Now here's a slightly whimsical idea: suppose we add a fourth party to the scene, a Guardian Angel allied with Sender and Receiver. The Guardian Angel watches Sender encode the message; though it cannot completely stop the Adversary from making its modifications to the codeword, it can step in to prevent as many as d < e of these modifications. These 'saves' are performed all at once, after the Adversary reveals its intended modifications, and the Angel can choose which sites to protect; it cannot make new modifications of its own.

Clearly the Guardian Angel is a boon. Consider an ECC in the standard setting (no Angel) with an encoding/decoding system that successfully recovers messages even after as many as e errors are introduced; the same protocol can recover the message with as many as e + d errors, with the help of a 'lazy Angel' who arbitrarily selects d of the adversary's errors and corrects them.

So here's the question: is this as good as it gets? Or can the Angel act more cleverly to help Sender and Receiver succeed in the presence of even more errors?

Note that we might want to change the encoding/decoding protocol itself, to optimize for the Angel's presence. For example, suppose d = 1; then we can successfully transmit a single bit in the presence of any number of errors, by simply having the Angel encode the intended bit in the parity of the message sent (convince yourself this works). The challenge is to use the Angel effectively while preserving the information rate and other positive properties of our ECC, as much as possible.

This is meant as a puzzle, but a somewhat open-ended one (in particular, one that ought to be investigated in the context of specific ECCs) that feeds into active research in coding theory. I'm hoping it will stimulate readers to rediscover some of the important notions in the area. I'll give hints and/or literature pointers sometime soon.

***

...Ok, here goes:

Hint: 'List Decoding' is a concept which has seen a lot of applications recently. Its starting point is the realization that, if we are willing to not always uniquely recover the codeword after errors are introduced, but instead narrow down the possibilities to a 'short' list of candidate messages, we can, for many natural codes, tolerate many more errors. For instance, with a binary alphabet, we cannot achieve unique decoding for nontrivial codes beyond the threshold of a .25 fraction of errors. On the other hand, if we allow list decoding (say, returning a poly(n)-sized list of candidate messages given a corrupted codeword of length O(n)), then we can tolerate almost a .5 fraction of errors.

For more discussion and references, see this post of Lance's.

So: does list decodability of a code imply unique decodability with the help of a Guardian Angel, perhaps an Angel making only a relatively small number of 'saves'? And if the list decoding procedure is computationally efficient, can the Guardian Angel and its associated decoder be efficient as well?

***

For a more thoroughly whimsical puzzle scenario, check out Conway's Angel Problem.

Labels: ,

Wednesday, May 30, 2007

Flooding and Kleitman's Theorem

In their classic, and highly entertaining, survey book on probabilistic methods in combinatorics, Alon and Spencer pose the following question: suppose we generate a random graph on n vertices, independently including each edge with probability 1/2. Suppose we do not look at the graph right away, but are informed that the graph we got contains a Hamiltonian path, that is, a path visiting each vertex exactly once.

Does this knowledge make it more or less likely that the graph is nonplanar?

Of course, it should make it more likely. Please stop and verify that this ought to be true. Can you prove it?

Results that prove facts like this are called 'correlation inequalities', and the simplest one that applies here is called Kleitman's Theorem. (In Alon and Spencer, Kleitman's Theorem is derived from more general results, but I encourage you to try to prove it directly.)

Say a subset A of the Boolean cube is monotone if, whenever x is in A and y >= x (bitwise domination), then y is in A too. We may also say that A is 'upwards closed'.

Kleitman's Theorem states that, given two monotone subsets A, B of the Boolean cube, if we pick a random cube element z, the chance of z being in B cannot decrease if we condition on the event that z is in A.

So, in the example, let the bits of the cube represent the presence/absence of edges in a graph of fixed size. Let A and B be the properties of Hamiltonicity and nonplanarity, respectively. Then Kleitman yields the intuitive fact we were looking for.

***

Now I am going to describe how Kleitman's result solves the previous flooded-city puzzle I posed last time. In fact, trying to reprove Kleitman's Theorem was what led me to pose the flooding puzzle in the first place, although I now tend to think of Kleitman's as the logically prior result.

To do this requires an easy generalization of Kleitman's Theorem, where we replace the boolean cube in the statement with {0, 1, ... m}^n. So, e.g., vector (3, 4, 5) dominates (3, 3, 4), but is incomparable with (4, 0, 0). In fact, we'll only need this result for n = 2.


Create a bipartite graph U x F, with unflooded houses on the left and flooded houses on the right. Add an edge from an unflooded house h to a flooded house h' if there exists an h-to-h' path with all steps running north or east.

Given a set S of unflooded houses, let N(S) denote the flooded houses adjacent to some house of S in the bipartite graph.



In the example above, the green-circled houses are S and the orange-circled flooded houses are N(S).

Each unflooded house contains a unit amount of water to be delivered, while each flooded house needs an amount |U|/|F| of water delivered, which can only be delivered from a house that is adjacent in the graph we defined. Then, by a version of Hall's Theorem, not difficult to derive from the original formulation, the delivery plan we seek exists if and only if for every subset S of U, the total water-need of N(S) meets or exceeds the amount of water available from S.

Suppose that the desired allocation does not exist. Then by Hall's Theorem, there exists some subset S of U such that the water supplied by S exceeds the need of N(S).

We take such an S and iteratively add into S all the unflooded north- and east-neighbors of houses in S. This doesn't increase the size of N(S), but it does increase the amount of water supplied by S, so the resulting modified S is still a witness to the impossibility of our desired allocation.

We repeat this until the union of S and N(S) is upwards-closed. Call the resulting set S'. Since S' supplies more water than N(S') demands, the ratio |S'|/|N(S')| exceeds |U|/|F|.

If S were as in the example above, S' would be the purple-circled set of houses below:



This means that, if we pick a house at random from within the union of S' and N(S'), we are less likely to get a flooded house than if we picked a house uniformly at random from the whole city. This means that, in picking uniformly from the whole city, the (upwards-closed) event of getting a flooded house is negatively correlated with the upwards-closed event of getting a house from the union of S' and N(S')--a contradiction to the generalized Kleitman's Theorem.

We conclude that the desired delivery plan exists, and can be found in polynomial time using known algorithms for bipartite matching.

Labels: , ,

Monday, May 14, 2007

The 'Sack of Potatoes' Theorem

Tired of packing ham sandwiches for lunch every day? Have some spuds for a change.

You've got a countable collection A_1, A_2, A_3, ... of potatoes. A potato is a convex body of diameter at most 1.

(That is: if p, q are points in a potato P, P also contains the entire line segment between them; furthermore, p, q are at a distance at most 1 from each other.)

Furthermore, most of these spuds are incredibly dinky: the total volume of the entire collection is at most some finite value v, say, 4 cubic inches. You'd really like to eat them all.

Trouble is, mashing or cutting them up beforehand would ruin their freshness. You want to bring them all to work whole in your lunchbox.

Prove that these spuds can be packed into a box of finite dimensions. In fact, such dimensions can be computed solely in terms of v, independent of the precise shape of the potatoes.

***

How to get started on a puzzle like this? One should try to find a sequence of reductions to simplify the problem. So, first, suppose we could do it for rectangular potatoes. Show how to pack a single potato of volume c into a box of volume at most a constant multiple of c, also with bounded diameter. Then, if you can pack these boxes (which have finite total volume), you've packed the potatoes they contain.

I solved the 2-D case, but haven't yet worked on 3-D or n dimensions (where it is also true). I read about this result in a charming volume: The Scottish Book, a collection of research problems (some simple, some still open) compiled by high-caliber, and very imaginative, mathematicians, including Ulam, Banach, and Mazur, who met regularly in the 'Scottish Coffee House', confusingly located in Lwow, Poland, during the '30s. Their book of problems was kept there by the management and notes the prizes--'A bottle of wine', etc.--offered for various solutions. It's definitely worth looking over, both for its diverse contents and for the image of mathematical community it offers.

Labels: , ,

Thursday, May 03, 2007

The Antichain King

Some readers should be familiar with the classic Kruskal-Katona Theorem. Well, legend came to life today... Gyula O. H. Katona (of the Renyi Institute in Hungary) just came to our department and gave a talk about his main area of expertise, extremal set theory.

Sperner's Theorem was the result that launched this now-sprawling field. It addresses the following question: How large a family of subsets {S_1, ... S_m} of {1, 2, ... n} can we find, such that no S_i contains another S_j? (Such a family is called an antichain in the lattice of subsets of {1, 2, ... n} partially ordered by set inclusion.)

Sperner states that the collection of all sets of size (floor-of-) n/2 gives the biggest such antichain you could hope for. It's clear that this set is a maximal antichain (can't be augmented); the claim is that it's also globally optimal.

Katona mentioned some nice applications of Sperner's Theorem. Here's a cute one: given a set of real numbers all greater than 1, use Sperner's Theorem to bound the number of subset sums taking on a value in any interval [x, x+1), x a positive real. This bounds the concentration of the random variable that outputs a randomly chosen subset-sum.

He mentioned many generalizations as well. Sperner's Theorem can be viewed as a statement about the number of sets necessary to guarantee an ordered pair, but we can ask about guaranteeing ordered triples, diamonds, or any poset P for that matter. He and colleagues have results of this form, and he conjectures that for any poset P, there exists a constant c depending only on P such that, for all n, there exists a maximum P-excluding family of subsets of {1, 2, ... n}, such that all sets in the family have size in the range (n/2 - c(P), n/2 + c(P)). So, the structure of optimality in Sperner's result could extend way further (Erdos proved a result of this type for all chains P).

Finally, Katona mentioned the 'group testing' problem that first drew him into this whole web of intrigue:

Given n, r, there are n boxes, r < n of which contain (say) radioactive material. We can in one step test any subset of the boxes for presence of radioactivity (but get no sense of how many radioactive boxes are in the group, just 'some' or 'none'). How many tests are necessary to determine the r hot boxes?

Sperner theory is relevant to the nonadaptive-query case.

Well? Can you see such an elegantly posed question and not try your hand at it? Thanks, Professor Katona!

Labels: