# Andy's Math/CS page

## Monday, August 27, 2007

### Loopy Thinking

Earlier I posted on the 'backwards orientation' in mathematics, aimed at finding new characterizations or 'origin stories' for familiar objects. Here is an exercise in this kind of math, hopefully amusing.

We'll consider the class of simple, rectifiable closed curves in the plane, that is, non-self-intersecting continuous 'loops' in R^2 to whose segments a definite finite arc-length can always be assigned.

Given two points x, y on such a curve C, let d_C(x, y) denote the distance along the curve from x to y (in whichever direction is shorter). On the other hand, let d(x, y) denote regular Euclidean distance.

We say that a curve respects distances if, for all u, v, x, y on C, we have

d_C(u, v) < d_C(x, y) if and only if d(u, v) < d(x, y).

Now, a circle respects distances in this sense; do any other curves in our class?

Would your answer change if we used a weaker notion of respecting distances? Namely, for all x, y, z in C,

d_C(x, y) < d_C(x, z) iff d(x, y) < d(x, z).

Bonus problem: the analogous question for rectifiable-path-connected, closed plane sets with nonempty interior. Convex bodies respect distances; do any others? (Recall that we previously discussed a very different kind of characterization of convexity.)
It may help to know that between any two points in such a set, there exists a path of minimum length (see Kolmogorov & Fomin's book on functional analysis, Sect. 20, Thm 3.).

Labels: , ,

## Saturday, August 25, 2007

### More on Couplings

Last time I described how couplings can be used to establish 'closeness' of two random variables. In fact, the results I stated imply that couplings can always be constructed when RVs are close. That still begs the question of whether couplings are a practical method of analysis for important RVs. Well, they definitely are, although to judge the full extent of it you'd have to ask a probabilist.

Beyond their direct usefulness, however, I think couplings also illustrate a broader idea: that to reason effectively about a probability distribution, it often pays to find new ways of 'generating' that distribution to better highlight certain features. This is something I learned to do by example in my classes, but always with lingering doubts as to the legitimacy of such reasoning and without an explicit awareness of the method and its generality. (Have readers had similar experiences?)

Here's a very simple example--the simplest--as a warm-up. Let X, Y be 0/1 random variables, with probabilities p, q respectively of getting a 1. Say p < q and the two numbers are 'close'.

Even though it is immediate from the definition of statistical distance that X, Y are close as RVs (which does not mean they are correlated!), let's see this in a second way by 'coupling' the variables.

Let T be a uniformly chosen real number in the unit interval. Then we can build a pair of coupled RVs by defining

X' = 1 iff T > 1 - p, Y' = 1 iff T > 1 - q.

Clearly X', Y' are individually identically distributed to X, Y respectively. (We said nothing about the joint distribution of X, Y, so it is irrelevant and distracting to ask if they are really 'the same' RVs as X', Y'. This is the kind of question that used to hang me up.) Moreover, X' = Y' unless T lies in the interval [p, q] (which occurs with prob. p - q). This shows X, Y are close via the coupling characterization of statistical distance.

Hope that was clear; here's another simple example.

Let m > n > 0 be natural numbers, m even. Fill an urn with m balls, half red, half blue. Draw n balls at random without replacement, and record the sequence of red and blue outcomes as a length-n bitstring X_n.

Compare X_n with U_n, the uniform distribution on {0, 1}^n; fixing m, how large can n be before the RVs cease to be close?

First, note the extreme cases:

-if n = 1 the RVs are identical;

-if n = m then they have a statistical distance on the order of 1 - 1/sqrt(n), since then X_n always has a Hamming weight of m/2 while U_n usually doesn't (so, we can construct a succesful distinguisher--recall the other characterizations of statistical distance from the last post).

Let's use coupling to examine an intermediate case: we will show that if n << sqrt(m), the RVs remain close. First, note that U_n can be generated by sampling balls with replacement. Let's sample with replacement n times to generate U_n, but now each time 'marking' the ball we draw before throwing it back in the urn. The marks don't affect the outcome of U_n.

To generate X_n in a coupled fashion, use the same sequence of draws, except that we ignore draws that bring up a marked ball, and we perform additional draws if needed until we record n outcomes.

Standard 'birthday paradox' reasoning assures us that if n << sqrt(m), then with high probability no marked ball is ever drawn, hence the two coupled RVs are identical, as needed.

What is the critical value of n for which X_n, U_n 'part ways' and become distinguishable? Is it O(sqrt(m)), or much larger? I could say more, but I still don't understand the issue as fully as I'd like, so I'll leave it as a (hopefully tantalizing) question for readers.

Next time: couplings applied to convergence of random walks on graphs.

Labels:

## Wednesday, August 08, 2007

### Another Public Service Announcement

We've all heard that randomness and probability are becoming more and more important in computer science. A number of good references exist for learning the basic techniques of probabilistic analysis. Many of these techniques are radically simple and incredibly powerful, even for solving problems where probability does not seem to play a role (such as design or optimization problems where what is wanted is a highly 'uniform' or 'balanced' structure, e.g. in exhibiting Ramsey graphs, expanders/extractors, low-discrepancy sets, etc.). As such they are a very good investment. One can go very far, for instance, granted only familiarity with the meaning and some applications of the following simple concepts:

-linearity of expectation

-the second moment (a.k.a. variance) method

-union bounds

-Chernoff-style deviation bounds

As I've said in the past, The Probabilistic Method by Alon and Spencer is the most delightful way to develop an appreciation for this stuff. I have blogged before on some of these ideas, and will happily do so again, but today I want to discuss an aspect of probabilistic analysis that receives somewhat less thorough discussion in the references I've consulted.

Namely: given two random variables X, Y, how should we quantify their similarity or difference? As an example of a case where such a question might be important, suppose we have a randomized algorithm that correctly solves some problem with high probability if given access to fair coin flips, but we don't have a 'perfect' source of random bits; we want to find conditions under which an imperfect or pseudorandom source will be 'close enough' to the uniform distribution to play the role of the algorithm's random bits, without destroying its correctness.

Is there one 'right' notion of similarity, or do we need multiple notions depending on application? The answer is, 'both'. There is a very widely useful quantity, called the 'statistical distance', which is effective in many cases, but to recognize these cases it is important to have several perspectives on the matter.

I will present the definition and three variants. I encourage readers to submit any others, and also to point me to official nomenclature and references for the variants (I'm working from memory).

Let X, Y be random variables. Let's say they take on (real-number) values from the finite set {a_1, a_2, ... a_n}, just to keep the discussion nice and safe. Suppose X, Y take on value a_i with probabilities p_i, q_i respectively.

The 'statistical distance' d(X, Y) is then defined as

d(X, Y) = .5 * (Sum from i=1 to i= n: |p_i - q_i|).

(That's an absolute value in there.)

Also denoted ||X - Y||, the statistical distance should not be confused with the L_1 distance, which is just twice the statistical distance when applied to probability vectors.

Readers should verify the following: d(X, Y) is a metric, taking values between 0 and 1. d(X, Y) = 0 if and only if X, Y are identically distributed, and d(X, Y) = 1 if and only if X, Y never take on the same values.

Now consider the three following questions we could ask about X and Y:

1) Say we want to build a 'distinguisher' to tell apart the two variables: that is, a function A: {a_1, ... a_n} --> {0, 1} that maximizes the quantity
|P[A(X) = 1] - P[A(Y) = 1]|. (I believe this is the 'distinguishing probability' of A, but don't quote me.) Call Q1(X, Y) the maximum quantity achievable in this way.

(Note that we could allow A to use additional randomness; it wouldn't affect the value Q1. This is important in applications.)

2) Now say a friend secretly chooses one of the two variables X or Y, each with probability 1/2. She then takes a random sample from the chosen variable and presents it to us. We are asked to guess which sample it came from, and we wish to find a rule that'll maximize our probability of guessing right.
Let Q2(X, Y) denote the maximum achievable probablity of correctness (which again is unaffected by the use of randomness in guessing).

3) Finally, say we wish to design a random variable Z = (Z_1, Z_2), outputting a pair of real values. We require that Z_1 be distributed identically to X
(i.e. d(X, Z_1) = 0), and that Z_2 be distributed identically to Y, but we do not require independence. Our goal is design the pair (Z_1, Z_2) (called a 'coupling' of X and Y) to maximize the probability that Z_1 = Z_2. Denote by Q3(X, Y) this maximum achievable probability.

Now, it is a pleasant fact that all of Q1, Q2, Q3 are completely determined by the value d(X, Y), and conversely, they also each determine d(X, Y) (and each other). Moreover, the quantities are all linearly related, so if you forget the precise relationships (as I invariably do, which motivated this post), they're easy to rederive by considering the extreme cases and interpolating. For the record, here are the relationships:

Q1(X, Y) = d(X, Y)

Q2(X, Y) = .5 + .5 * d(X, Y)

Q3(X, Y) = 1 - d(X, Y)

I recommend verifying these facts once or twice and then taking them for granted. Note that the first two (of which Q1 is more frequently used) are measures of distance, while Q3 is a measure of similarity. Both perspectives come in handy.

Note too that, while d(X, Y) is defined by an equation, Q1-Q3 are defined by an ability to do something; Q1-Q2 in particular are framed around an agent's ability to distinguish between cases, and it is natural that computer scientists often prefer to take such a view.

In future posts, I hope to elaborate somewhat on these perspectives (couplings in particular, which are probably the most interesting).

Labels: ,