We've all heard that randomness and probability are becoming more and more important in computer science. A number of good references exist for learning the basic techniques of probabilistic analysis. Many of these techniques are radically simple and incredibly powerful, even for solving problems where probability does not seem to play a role (such as design or optimization problems where what is wanted is a highly 'uniform' or 'balanced' structure, e.g. in exhibiting Ramsey graphs, expanders/extractors, low-discrepancy sets, etc.). As such they are a very good investment. One can go very far, for instance, granted only familiarity with the meaning and some applications of the following simple concepts:
-linearity of expectation
-the second moment (a.k.a. variance) method
-union bounds
-Chernoff-style deviation bounds
As I've said in the past,
The Probabilistic Method by Alon and Spencer is the most delightful way to develop an appreciation for this stuff. I have blogged before on some of these ideas, and will happily do so again, but today I want to discuss an aspect of probabilistic analysis that receives somewhat less thorough discussion in the references I've consulted.
Namely: given two random variables
X, Y, how should we quantify their similarity or difference? As an example of a case where such a question might be important, suppose we have a randomized algorithm that correctly solves some problem with high probability if given access to fair coin flips, but we don't have a 'perfect' source of random bits; we want to find conditions under which an imperfect or pseudorandom source will be 'close enough' to the uniform distribution to play the role of the algorithm's random bits, without destroying its correctness.
Is there one 'right' notion of similarity, or do we need multiple notions depending on application? The answer is, 'both'. There is a very widely useful quantity, called the 'statistical distance', which is effective in many cases, but to recognize these cases it is important to have several perspectives on the matter.
I will present the definition and three variants. I encourage readers to submit any others, and also to point me to official nomenclature and references for the variants (I'm working from memory).
Let
X, Y be random variables. Let's say they take on (real-number) values from the finite set
{a_1, a_2, ... a_n}, just to keep the discussion nice and safe. Suppose
X, Y take on value
a_i with probabilities
p_i, q_i respectively.
The 'statistical distance'
d(X, Y) is then defined as
d(X, Y) = .5 * (Sum from i=1 to i= n: |p_i - q_i|).
(That's an absolute value in there.)
Also denoted
||X - Y||, the statistical distance should not be confused with the L_1 distance, which is just twice the statistical distance when applied to probability vectors.
Readers should verify the following:
d(X, Y) is a
metric, taking values between 0 and 1.
d(X, Y) = 0 if and only if
X, Y are identically distributed, and
d(X, Y) = 1 if and only if
X, Y never take on the same values.
Now consider the three following questions we could ask about
X and
Y:
1) Say we want to build a 'distinguisher' to tell apart the two variables: that is, a function
A: {a_1, ... a_n} --> {0, 1} that maximizes the quantity
|P[A(X) = 1] - P[A(Y) = 1]|. (I believe this is the 'distinguishing probability' of
A, but don't quote me.) Call
Q1(X, Y) the maximum quantity achievable in this way.
(Note that we could allow
A to use additional randomness; it wouldn't affect the value
Q1. This is important in applications.)
2) Now say a friend secretly chooses one of the two variables
X or
Y, each with probability 1/2. She then takes a random sample from the chosen variable and presents it to us. We are asked to guess which sample it came from, and we wish to find a rule that'll maximize our probability of guessing right.
Let
Q2(X, Y) denote the maximum achievable probablity of correctness (which again is unaffected by the use of randomness in guessing).
3) Finally, say we wish to design a random variable
Z = (Z_1, Z_2), outputting a pair of real values. We require that
Z_1 be distributed identically to
X (i.e.
d(X, Z_1) = 0), and that
Z_2 be distributed identically to
Y, but we do not require independence. Our goal is design the pair
(Z_1, Z_2) (called a 'coupling' of
X and
Y) to maximize the probability that
Z_1 = Z_2. Denote by
Q3(X, Y) this maximum achievable probability.
Now, it is a pleasant fact that all of Q1, Q2, Q3 are completely determined by the value
d(X, Y), and conversely, they also each determine
d(X, Y) (and each other). Moreover, the quantities are all
linearly related, so if you forget the precise relationships (as I invariably do, which motivated this post), they're easy to rederive by considering the extreme cases and interpolating. For the record, here are the relationships:
Q1(X, Y) = d(X, Y)Q2(X, Y) = .5 + .5 * d(X, Y)Q3(X, Y) = 1 - d(X, Y)I recommend verifying these facts once or twice and then taking them for granted. Note that the first two (of which Q1 is more frequently used) are measures of distance, while
Q3 is a measure of similarity. Both perspectives come in handy.
Note too that, while
d(X, Y) is defined by an equation,
Q1-Q3 are defined by an ability to
do something;
Q1-Q2 in particular are framed around an agent's ability to distinguish between cases, and it is natural that computer scientists often prefer to take such a view.
In future posts, I hope to elaborate somewhat on these perspectives (couplings in particular, which are probably the most interesting).
Labels: general math, probability