# Andy's Math/CS page

## Friday, March 07, 2008

### Beasts of Probability and Plane Geometry

Say you're trying to predict whether some event E occurs or not. There is another collection of events I_1, I_2, ... I_k, which are positive predictors of E: for every j, E occurs with probability at least .99 conditioning on the event that I_j occurs.

Can we lower-bound the probability that E occurs conditioning on the event that *at least one* I_j occurs?

(Think about it before reading on.)

Here's a simple example of what can go wrong: let the underlying probability space be a sequence of n unbiased coin flips. Let E be the event that at least 2/3 of the flips come up heads. For each subset S of {1, 2, ... n} of size exactly
.4n, let I_S be the event that all the coin flips indexed by S come up heads.

If n is large enough, we have that

i) E is almost surely false, yet

ii) Almost surely, some I_S is satisfied--even though

iii) Conditioning on any fixed event I_S, E becomes almost surely true (since we then expect half of the remaining flips to come up heads, yielding about a .4 + .5*.6 = .7 fraction of heads total).

One can also modify this example to make conditioning on the union of the I_j's actually decrease the probability that E occurs.

This kind of conclusion seems somewhat pathological and inconvenient, so it's natural to look for restrictions that prevent it from arising. The simplest would be to restrict the number, k, of predictor variables: for the above hypotheses, we have that the probability of E conditioning on the union of the I_j's is at least

.99 / (1 + .01(k - 1)).

(to see this, think about the worst possible case, which resembles a 'sunflower' in probability space.)

A more interesting direction is to restrict the structure of the predictor events within probability space. For instance, suppose that the probability space is a uniformly drawn point from the unit interval, E is some arbitrary subset of the interval, and each I_j is the indicator variable for some fixed subinterval. Then, regardless of the number of I_j, we can conclude that E occurs with high probability conditioning on the union of I_j's; not quite .99, but close. See my closely related earlier post for details.

It is natural to try to extend this result to higher dimensions, and for predictor indicator-sets given by axis-parallel rectangles this succeeds by an induction (although the effect gets exponentially weaker with the dimension). Similar results hold if the sets are required to be 'fat' convex bodies, in which case an argument much like the 1-dimensional one works.

However, allowing rotated rectangles destroys the effect even in two dimensions. Here's one interpretation: consider the unit square as a city, and take the set S to be the set of Democratic households.

In pathological cases, it's possible to find a covering of the square by overlapping convex 'precincts', such that

i) in each precinct, 99% of the households are Democratic, yet

ii) overall, 99% of houses are Republican!

Such sets seem truly bizarre. For a long time I was convinced they couldn't exist, but after failing to prove this, I finally tracked down a construction in a Harmonic Analysis textbook by Elias Stein (who, besides seeming impressive in his own right, was PhD advisor to Fields Medalists Charles Fefferman and Terence Tao). These sets, whose construction resembles a kind of hydra spawning ever more and tinier heads, are related to the more well-known Besicovitch/Kakeya sets. One can even achieve a kind of fantastical limit, in the following theorem for which Stein provides a reference:

There exists a subset S of measure zero in the unit square, such that for every point x on the square, there exists a line L thru x, such that S contains all of L except, possibly, x itself!

Labels: ,