Andy's Math/CS page

Thursday, May 15, 2008

Random bits

This morning I was delighted to learn that Alan Baker, the Swarthmore professor whose Philosophy of Science class I took senior year, recently won the US Shogi championships--read his account here. Congrats, Prof. Baker!
I never learned Shogi, although Go--another Asian board game--was a big part of my youth and probably a decisive factor in my eventual interest in math/CS.

In other news, I am coauthor on a paper, 'The Power of Unentanglement', recently released on ECCC (and headed for CCC '08), jointly with Scott Aaronson, Salman Beigi, Bill Fefferman, and Peter Shor. It's about multiple-prover, single-round, quantum interactive proofs, and I encourage those whose cup of tea that is to take a look.

This is my first appearance in a conference paper, but not quite my first time in print--it happened once before, by accident. As a freshman in college, I asked my Linear Algebra professor, Steven Maurer, a question on an online class discussion board. Here it is, in the breathless and overwrought prose of my teenage years:

"This question has been haunting me, and I know I shouldn't expect definite answers. But how do mathematicians know when a theory is more or less done? Is it when they've reached a systematic classification theorem or a computational method for the objects they were looking for? Do they typically begin with ambitions as to the capabilities they'd like to achieve? I suppose there's nuanced interaction here, for instance, in seeking theoretical comprehension of vector spaces we find that these spaces can be characterized by possibly finite 'basis' sets. Does this lead us to want to construct algorithmically these new ensembles whose existence we weren't aware of to begin with? Or, pessimistically, do the results just start petering out, either because the 'interesting' ones are exhausted or because as we push out into theorem-space it becomes too wild and wooly to reward our efforts? Are there more compelling things to discover about vector spaces in general, or do we need to start scrutinizing specific vector spaces for neat quirks--or introduce additional structure into our axioms (or definitions): dot products, angles, magnitudes, etc.?

Also, how strong or detailed is the typical mathematician's sense of the openness or settledness of the various theories? And is there an alternative hypothesis I'm missing? "


Steve gave a thoughtful reply, years passed, and then a similar topic came up in coversation between himself and the mathematician and popular math writer Philip J. Davis. The conversation sparked an essay by Davis in which he quoted Maurer and I (with permission), the essay recently became part of a philosophy-of-math book ('Mathematics and Common Sense: A Case of Creative Tension'), and I got mailed a free copy--sweet! Once again, I recommend the book to anyone who enjoys that kind of thing. The essay is online.

Labels:

Saturday, May 10, 2008

Free on Friday?

You should come to Johnny D's in Davis Square and hear No Static, a 9-or-10-piece Steely Dan tribute band (standard preemptive clarification: Steely Dan is the band name, not a person).

No Static plays regularly in the area--I saw them in the fall and they were excellent, channeling the Dan's recordings with amazing care and fidelity.

Like most great artists, Steely Dan affords a unique perspective on the world, one best imbibed by listening to multiple albums in their weird entirety. This is exactly No Static's approach, so if you don't know Steely Dan, or have only heard a few catchy radio numbers, here's your chance to get hooked. Hope to see you there!

Thursday, May 01, 2008

Complexity Calisthenics (Part I)

Today I want to describe and recommend a paper I quite enjoyed: The Computational Complexity of Universal Hashing by Mansour, Nisan, and Tiwari (henceforth MNT). I think that this paper, while not overly demanding technically, is likely to stretch readers' brains in several interesting directions at once.

As a motivation, consider the following question, which has vexed some of us since the third grade: why is multiplication so hard?

The algorithm we learn in school takes time quadratic in the bit-length of the input numbers. This is far from optimal, and inspired work over the years has brought the running time ever-closer to the conjecturally-optimal n log n; Martin Furer published a breakthrough in STOC 2007, and there may have been improvements since then. But compare this with addition, which can easily be performed with linear time and logarithmic space (simultaneously). Could we hope for anything similar? (I don't believe that any of the fast multiplication algorithms run in sublinear space, although I could be mistaken. Here is Wiki's summary of existing algorithms for arithmetic.)

As MNT observe, the question is made especially interesting when we observed that multiplication could be just as easily achieved in linear time/logspace... if we were willing to accept a different representation for our integer inputs! Namely, if we're given the prime factorizations of the inputs, we simply add corresponding exponents to determine the product. There are two hitches, though: first, we'd have to accept the promise that the prime factorizations given really do involve primes (it's not at all clear that we'd have the time/space to check this, even with the recent advances in primality testing); second, and more germane to our discussion, addition just got much harder!

The situation is similar over finite fields of prime order (Z_p): in standard representation, addition is easy and multiplication is less so, while if we represent numbers as powers of a fixed primitive root, the reverse becomes true. This suggests a woeful but intriguing possibility: perhaps no matter how we represent numbers, one of the two operations must be computationally complex, even though we have latitude to 'trade off' between + and *. So we are invited to consider

Mental Stretch #1: Can we prove 'representation-independent' complexity lower bounds?

Stretch #2: Can we prove such lower bounds in the form of tradeoffs between two component problems, as seems necessary here?

In the setting of finite-field arithmetic, MNT answer 'yes' to both problems. The lower bounds they give, however, are not expressed simply in terms of time usage or space usage, but instead as the product of these two measures. Thus we have

Stretch #3: Prove 'time-space tradeoffs' for computational problems.

To be clear, all three of these 'stretches' had been made in various forms in work prior to MNT; I'm just using their paper as a good example.

The combination of these three elements certainly makes the task seem daunting. But MNT have convinced me, and I hope to suggest to you, that with the right ideas it's not so hard. As the paper title indicates, their point of departure is Universal Hashing (UH)--an algorithmic technique in which finite fields have already proved useful. Use upper bounds to prove lower bounds. We can call this another Stretch, or call it wisdom of the ages, but it deserves to be stated.

So what is UH? Fix a domain D and a range R. a (finite) family H of functions from D to R is called a Universal Hash Family (UHF) if the following holds:

For every pair of distinct elements d, d' in D,

and for every pair of (not necessarily distinct) elements r, r' in R,

if we pick a function h at random from H,

Prob[h(d) = r, h(d') = r' ] = 1/|R|^2.

In other words, the randomly chosen h behaves just like a truly random function from D to R when we restrict attention to any two domain elements. (In typical applications we hope to save time and randomness, since H may be much smaller than the space of all functions.)


Here is what MNT do: they prove that implementing any UHF necessitates a complexity lower bound in the form of a time-space product. (To 'implement' a UHF H is to compute the function f_H(h, x) = h(x).)

This is in fact the main message of the paper, but they obtain our desired application to + and * as a corollary, by citing the well known fact that, fixing a prime field Z_p = D = R,

H = {h_{a, b}(x) = a*x + b mod p}

is a UHF, where a, b range over all field elements. (Left as an easy exercise.)

Note the gain from this perspective: implementing a UHF is a 'representation-invariant' property of a function, so Stretch 1 becomes possible. Moreover, Stretch 2 now makes more sense: it is only jointly that + and * define a UHF, so whatever complexity measure M we lower-bound for UHFs implies only a tradeoff between + and *.

It remains only to sketch the proof of time-space tradeoffs for UHFs, which in fact is a manageable argument along classic lines (the basic form of the argument is attributed to earlier work by Borodin and Cook). The upshot for us will be that for any program computing (any representation of) f(a, b, x) = a*x + b over an n-digit prime modulus, if T denotes worst-case time usage and S worst-case space, T*S = Omega(n^2). Very nice! (Although what this actually implies about the larger of the individual time-space products of + and * under this representation is not clear to me at the moment.)

Let's adjourn for today... wouldn't want to pull a muscle.

Labels:

Wednesday, April 16, 2008

A simple plan to improve your graduate program

It's just this: Food is key. We need more food. (In what follows, I'm speaking not just for MIT theory students, but for all students everywhere.)

Cancel the subscription to 'Journal of Timed Networked Multithreaded Aqueous Automata', and a few others. You've just saved about $20,000.

Use the money to provide copious snacks for students and faculty. Weekly receptions help, but really we're hungry all the time. To elaborate on that point:

-The student center is a tiresome 5-10 minutes away.

-Graduate students are low on cash. We work strange hours that discourage grocery shopping (and may not own a car). Some of us are newly weaned from the meal plans of our undergrad days, and we're only slowly learning to provide for ourselves. The food around here is expensive.

If department budget is truly an issue, there is another way, practiced with admirable success by UC San Diego's CSE department: recruit grad student volunteers to maintain a stocked snack-room, with foods purchased cheaply in bulk and paid for on the honor system. Of course, a snack-room should also be a social space.

Candy and tasty treats help, but it's too easy to over-rely on them and come crashing down. At some point we all wish there were less of these around the office. Consider in their stead:

-bagels
-raisins
-apples and bananas
-peanuts and peanut butter

...all cheap, real-tasting, and calorific.

That's it! An easy, cost-effective intervention that will keep students and faculty working happily in their offices on their next theorem or patentable device.

Labels:

Friday, April 04, 2008

Some babies grow in a peculiar way

Today I bumped into an order of growth I hadn't seen before, and thought I'd share it for a modest bit of mental aerobics.

Readers may well have seen functions of form

f(n) = (log n)^c,

known as 'polylogarithmic'. These are important in, e.g., query complexity, where we like the number of queries to be much smaller than the input size n when possible. Also emerging from such studies are the 'quasipolynomial' functions, of form

g(n) = 2^{(log n)^c}.

As a warmup--how fast do these things grow?


OK, now the main course tonight is the following:

h(n) = 2^{2^{(log log n)^c}}.

What do you make of these? And what questions need answering before we understand such a growth rate 'well enough'? I'm unsure and would love to hear your thoughts.

Labels:

Friday, March 21, 2008

News, and some Number Theory

Time for a personal update: I'm enjoying myself greatly here in Cambridge, and was recently admitted as a transfer student to MIT's EECS department. The move has both personal and academic advantages for me, but let me emphasize that I think UCSD has a lot to offer prospective students, and in addition is home to many cool people whom I miss... I would be happy to offer advice about either school.

When I started writing here as an undergrad, I knew very few people in the worlds of TCS and Complexity, and the blog was a way of reaching out to a community I wanted to join. Now, thanks in part to blogging (thru which I met Scott, my advisor), I'm immersed in that community at a school with a large, vibrant Theory group. This is to say that, though writing here still interests me, it no longer answers an urgent personal need, and I am most likely to post new material in response to a request for specific topics or post types.


Today I was perusing a favorite book of mine, 'Gems of Theoretical Computer Science' by U. Schoning & R. Pruim. One chapter partially treats the undecidability of determining whether systems of Diophantine equations (polynomial equations where solutions are required to be integral) have a solution. This was one of Hilbert's problems from 1900, solved in 1971 and full of deep math.

The authors pose the following exercise: show that the version where variables must take nonnegative values, is reducible to the integer version. (And vice-versa; also the rational-values version is reducible to the integral version; the converse appears to be open...)

Think about it...



The reduction they give: Given a set of polynomial equations {P_i(X_1, ... X_k)}, we want to determine if they're simultaneously satisfiable with nonnegative values. Introduce, for each j <= k, the variables Y_{j, 1}, Y_{j, 2}, Y_{j, 3}, Y_{j_4}.

Now add to the system, for each j <= k, the constraint that

X_j = Y_{j, 1}^2 + Y_{j, 2}^2 + Y_{j, 3}^2 + Y_{j_4}^2.

Claim: the new system is satisfiable over the integers iff the original one is satisfiable over the nonnegative integers! The proof, of course, is an easy consequence of Lagrange's Theorem that every nonnegative integer is the sum of 4 integer squares.

So, my question is, could a reduction be found that doesn't rely on Lagrange's Theorem? Or its weaker variants where the constant 4 is replaced with some constant c > 4. Or maybe for some constant c, the proof is really so simple that I will be satisfied that we are performing this reduction with the cheapest tools.

If our plan in the reduction is to constrain the original variables in a form analogous to the above, namely

X_j = Q_j(Y, Z, ...),

where Y, Z, ... are new integral variables, is there any way around proving a version of Lagrange's Theorem? Generally we show that polynomials are identically nonnegative by expressing them as a sum of one or more squares, e.g.,

Y^2 + Z^2 - 2YZ = (Y - Z)^2.

Using this template, we'd be reliant on Lagrange. However, in his thesis, Minkowski conjectured that there exist nonnegative real polynomials *not* so expressible, and this was proved by Hilbert. A simple example I found online is

Q(X, Y, Z) = X^4*Y^2 + Y^4*Z^2 + Z^4*Y^2 - 3X^2*Y^2*Z^2,

and another counterexample is derived systematically in the classy and useful inequalities book 'The Cauchy-Schwartz Master Class' by J.M. Steele. (Artin proved, however, that every poly that's identically nonnegative, is expressible as the sum of finitely many *rational* functions squared.)

Could it be that one of these creatures is easier to use in the reduction (both to prove it's nonnegative, and ranges over all nonnegative values)? Somehow I doubt it. Anyways, I just wanted to point to an instance where a reduction one would expect to be straightforward seems to require fairly nontrivial understanding of the problem domain.

Labels: ,

Friday, March 07, 2008

Beasts of Probability and Plane Geometry

Say you're trying to predict whether some event E occurs or not. There is another collection of events I_1, I_2, ... I_k, which are positive predictors of E: for every j, E occurs with probability at least .99 conditioning on the event that I_j occurs.

Can we lower-bound the probability that E occurs conditioning on the event that *at least one* I_j occurs?

(Think about it before reading on.)



Here's a simple example of what can go wrong: let the underlying probability space be a sequence of n unbiased coin flips. Let E be the event that at least 2/3 of the flips come up heads. For each subset S of {1, 2, ... n} of size exactly
.4n, let I_S be the event that all the coin flips indexed by S come up heads.

If n is large enough, we have that

i) E is almost surely false, yet

ii) Almost surely, some I_S is satisfied--even though

iii) Conditioning on any fixed event I_S, E becomes almost surely true (since we then expect half of the remaining flips to come up heads, yielding about a .4 + .5*.6 = .7 fraction of heads total).

One can also modify this example to make conditioning on the union of the I_j's actually decrease the probability that E occurs.


This kind of conclusion seems somewhat pathological and inconvenient, so it's natural to look for restrictions that prevent it from arising. The simplest would be to restrict the number, k, of predictor variables: for the above hypotheses, we have that the probability of E conditioning on the union of the I_j's is at least

.99 / (1 + .01(k - 1)).

(to see this, think about the worst possible case, which resembles a 'sunflower' in probability space.)

A more interesting direction is to restrict the structure of the predictor events within probability space. For instance, suppose that the probability space is a uniformly drawn point from the unit interval, E is some arbitrary subset of the interval, and each I_j is the indicator variable for some fixed subinterval. Then, regardless of the number of I_j, we can conclude that E occurs with high probability conditioning on the union of I_j's; not quite .99, but close. See my closely related earlier post for details.



It is natural to try to extend this result to higher dimensions, and for predictor indicator-sets given by axis-parallel rectangles this succeeds by an induction (although the effect gets exponentially weaker with the dimension). Similar results hold if the sets are required to be 'fat' convex bodies, in which case an argument much like the 1-dimensional one works.

However, allowing rotated rectangles destroys the effect even in two dimensions. Here's one interpretation: consider the unit square as a city, and take the set S to be the set of Democratic households.

In pathological cases, it's possible to find a covering of the square by overlapping convex 'precincts', such that

i) in each precinct, 99% of the households are Democratic, yet

ii) overall, 99% of houses are Republican!

Such sets seem truly bizarre. For a long time I was convinced they couldn't exist, but after failing to prove this, I finally tracked down a construction in a Harmonic Analysis textbook by Elias Stein (who, besides seeming impressive in his own right, was PhD advisor to Fields Medalists Charles Fefferman and Terence Tao). These sets, whose construction resembles a kind of hydra spawning ever more and tinier heads, are related to the more well-known Besicovitch/Kakeya sets. One can even achieve a kind of fantastical limit, in the following theorem for which Stein provides a reference:

There exists a subset S of measure zero in the unit square, such that for every point x on the square, there exists a line L thru x, such that S contains all of L except, possibly, x itself!

Labels: ,