Andy's Math/CS page

Sunday, July 15, 2007

A Tricky Vote

An election is being held to select one of two candidates, A or B. n voters turn out in a long, orderly line. A motley crew of pollsters show up to predict the turnout, but none is diligent enough to survey the entire line, or savvy enough to conduct a proper statistical sample. Instead, each pollster arbitrarily selects some single interval of the line and queries everyone in that interval. The intervals may be of various sizes, and they may overlap.

After comparing their haphazard data, the pollsters notice that candidate A, the presumptive favorite, has more than two-thirds support in every survey interval.

Assume the process is 'nice' in the following senses:

-Everyone truthfully reveals their intentions to the pollsters, and intentions don't change;
-Everyone in line eventually votes (once) for their intended candidate, and votes are faithfully counted;
-No one changes position in the line.

Also assume that everyone is surveyed at least once.

Can you conclude that candidate A is the winner?

Note that 2/3 is the lowest threshold one could hope for: for consider 4 voters, with prefereces going B A A B, and with 2 survey intervals consisting of the first three and last three voters respectively; then A has exactly 2/3 support in each, yet the outcome is a tie.

If this problem gives you trouble, try solving it first with 'two thirds' replaced by 'ninety-nine percent'.

I enjoy questions like this one, that try to deduce 'global' conditions from a collection of 'local' observations. With the advent of 'property testing' in computer science, they have become increasingly important to the field. I hope to make further posts on the subject soon, and possibly also discuss generalizations of the above problem (Does anyone know if/where it's been studied before?)

Update, 9/27: I've since learned that this problem falls in well-trod territory of functional and harmonic analysis; specifically, the (affirmative) answer to this problem essentially states that a certain 'maximal operator' over a function space is 'bounded'. The solution techniques described by David and myself in the comments are standard and fall under the rubric of 'covering theorems', the most basic one being due to Vitali. I'll try to blog more soon on what I've learned about this problem, which gets very interesting in 2 dimensions...

Labels: ,

Monday, July 09, 2007

Guardian Angel Decoding

In the conventional setting of error-correcting codes (ECCs), there are three parties: Sender, Receiver, and Adversary. Sender wants to send a message m to Receiver; problem is, Adversary may intercept the message string and modify as many as e > 0 symbols before Receiver gets the message.

The idea of an error-correcting code is to encode the message m in a redundant fashion as some 'codeword' C(m), so that even after as many as e errors are introduced, the original message m is uniquely recoverable.

For a simple example (not representative of the sophistication of more powerful codes), say Sender wants to transmit a single bit of information, and e = 1. Then we could have C(0) = 000, C(1) = 111, and Receiver can correctly recover the message bit by taking the majority vote of the 3 bits received.

For a much more comprehensive introduction of ECCs from a TCS perspective, see e.g. this survey by Sudan, or this one by Trevisan.

Now here's a slightly whimsical idea: suppose we add a fourth party to the scene, a Guardian Angel allied with Sender and Receiver. The Guardian Angel watches Sender encode the message; though it cannot completely stop the Adversary from making its modifications to the codeword, it can step in to prevent as many as d < e of these modifications. These 'saves' are performed all at once, after the Adversary reveals its intended modifications, and the Angel can choose which sites to protect; it cannot make new modifications of its own.

Clearly the Guardian Angel is a boon. Consider an ECC in the standard setting (no Angel) with an encoding/decoding system that successfully recovers messages even after as many as e errors are introduced; the same protocol can recover the message with as many as e + d errors, with the help of a 'lazy Angel' who arbitrarily selects d of the adversary's errors and corrects them.

So here's the question: is this as good as it gets? Or can the Angel act more cleverly to help Sender and Receiver succeed in the presence of even more errors?

Note that we might want to change the encoding/decoding protocol itself, to optimize for the Angel's presence. For example, suppose d = 1; then we can successfully transmit a single bit in the presence of any number of errors, by simply having the Angel encode the intended bit in the parity of the message sent (convince yourself this works). The challenge is to use the Angel effectively while preserving the information rate and other positive properties of our ECC, as much as possible.

This is meant as a puzzle, but a somewhat open-ended one (in particular, one that ought to be investigated in the context of specific ECCs) that feeds into active research in coding theory. I'm hoping it will stimulate readers to rediscover some of the important notions in the area. I'll give hints and/or literature pointers sometime soon.


...Ok, here goes:

Hint: 'List Decoding' is a concept which has seen a lot of applications recently. Its starting point is the realization that, if we are willing to not always uniquely recover the codeword after errors are introduced, but instead narrow down the possibilities to a 'short' list of candidate messages, we can, for many natural codes, tolerate many more errors. For instance, with a binary alphabet, we cannot achieve unique decoding for nontrivial codes beyond the threshold of a .25 fraction of errors. On the other hand, if we allow list decoding (say, returning a poly(n)-sized list of candidate messages given a corrupted codeword of length O(n)), then we can tolerate almost a .5 fraction of errors.

For more discussion and references, see this post of Lance's.

So: does list decodability of a code imply unique decodability with the help of a Guardian Angel, perhaps an Angel making only a relatively small number of 'saves'? And if the list decoding procedure is computationally efficient, can the Guardian Angel and its associated decoder be efficient as well?


For a more thoroughly whimsical puzzle scenario, check out Conway's Angel Problem.

Labels: ,