Andy's Math/CS page

Making academic contacts (some thoughts for new researchers)

2014-10-18T16:19:00.000-04:00

Suppose you're an undergraduate hoping to go into academic research, or a beginning grad student. It could be very helpful to have academic contacts at other schools---such as professors, but also maybe postdocs and grad students. (I'm concerned here with starting scientific discussions and/or collaborations, not making contacts for graduate admissions per se. Admissions decisions will be made based on your application. But a successful collaboration and resulting publication, or letter of support, is one of the best things you can hope to add to your applications.)

Say you've never written a paper, and may not yet have the time or readiness to solve major problems on your own; but you want to have real scientific conversations and perhaps even collaboration with a busy researcher you admire. What could you possibly talk about? What can you offer them? Below is my advice.

***

1) In my view, the best approach to making contacts is closely related to how I'd recommend spending your own study time as a beginning researcher: (a) start reading research papers as soon as possible, and (b) try diligently to ask interesting follow-up questions to the papers you like.

Asking questions is a central research skill. It is one you can start learning early on because, even in technically difficult fields, simple patterns of questioning often recur in many forms across papers. E.g., in the theory of computing, if there is a new model of computation being studied, it is likely to have randomized or nondeterministic versions that can be studied next. Or if there is an algorithmic problem, it might have variant problems to be solved in, say, the communication protocol or decision-tree models. And so on.

Even though it is not so difficult to produce these variations, few people do so systematically. If you keep at it, and keep trying to learn and adopt more patterns of questioning, you will hit upon interesting and solvable problems. You will also start to develop a taste about which problems are likely to be most interesting, which are too hard, and so on. (Side note---I believe that keeping a research journal with your questions greatly helps this process.)

2) Now returning to the goal of making contacts, I think that one of the best kind of emails you can send is one that contains a good research question, that's reasonably related to the person's research area. If you can do this and pique their interest, they may well enter into a dialogue that could become a full-fledged collaboration---regardless of your credentials on paper. After all, you've already brought something important to the table.

A key advantage of this approach to making contacts is that you're aiming to attract the researcher's own interest and curiosity, rather than just asking them for something. Another advantage is that there is a lot of freedom in asking and considering research questions. If you ask someone an interesting, specific question about topic X, there is no presumption that X is your only interest or that future interactions will be limited to that. And in the course of your correspondence, you might realize there is a better question to ask about X, or you might getting around to asking something about topic Y as well. That's how scientific interactions go.

So you don't have to worry about choosing the exact right question to represent yourself, as long as it is good and leads to discussion. In this respect it's less stressful than trying to introduce yourself by defining your whole research outlook. And it's certainly more promising than suggesting a collaboration based on your GPA or work experience.

Of course, asking good questions isn't easy and takes work. You should think carefully about them before sending, try your best to answer them yourself, and see if there are initial observations or partial solutions you can provide with your question to show you're serious. Maybe in the end it will be a question they can answer right away; but again, it could still lead to other questions. There is little risk in trying and it is likely to at least be a useful exercise.

A geometric-graphs offering

2011-09-28T12:35:00.000-04:00

Introduction

After a night of board games, I found myself thinking about the peculiarities of movement on discrete versions of the plane. This suggested a number of questions. As they would likely suffer neglect at my hands, I'm posting them here for others to enjoy---any ideas or references are very welcome.

The basic structure that got me thinking was the 8-neighborhood (or Moore) graph:

This graph (which we'll denote by $G_{8N}$) describes how a chess king moves on an infinite chessboard. It's often convenient for game designers, but there's something... wrong about it: it distorts distances in the plane.

To make this formal, let $G$ be a (finite or countably infinite) undirected graph. Let $d_G(u, v)$ denote the shortest-path distance in $G$ between vertices $u, v$.

We say $F: V(G) \rightarrow \mathbf{R}^2$ is an embedding of $G$ if $|| F(u) - F(v)||_2 \geq d_G(u, v)$ for all distinct vertices $u, v$. (This is just a normalization convention.) Define the distortion of $F$ as the maximum (supremum) of

\[ \frac{|| F(u) - F(v) ||_2}{d_G(u, v)} , \quad{} u \neq v . \]

The study of low-distortion embeddings (which can be pursued in a more general setting) has been a highly-active TCS research topic, largely due to its role in designing efficient approximation algorithms for NP-hard problems. My initial focus here will be on embeddings for periodic and highly-symmetric graphs like $G_{8N}$.

As an example, look at the usual embedding of the 8-neighborhood graph into the plane. This has a distortion of $\sqrt{2}$, witnessed by points along a diagonal.

Warm-up: Show that $\sqrt{2}$ is the minimum distortion of any embedding of the 8-neighborhood graph $G_{8N}$.

Symmetries and Distortion

Now a basic observation here is that, when we started with a graph with a high degree of inherent symmetry, we found that its optimal (distortion-minimizing) embedding was also highly-symmetric. I would like to ask whether this is always the case.

For background, an automorphism of $G$ is a bijective mapping $\phi$ from $V(G)$ to itself, such that $(u, v) \in E(G) \Leftrightarrow (\phi(u), \phi(v)) \in E(G)$.

Let's say that a graph $G$ has 2D symmetry if there's an embedding $F$ of $V(G)$ into the plane, and linearly independent vectors $\mathbf{p}, \mathbf{q} \in \mathbf{R}^2$, such that a translation of the plane by $\mathbf{p}$ or by $\mathbf{q}$ induces an automorphism of $G$ (in the obvious way). In this case we also say the embedding $F$ has 2D symmetry.

So for example, with the usual embedding of $G_{8N}$, we can take $\mathbf{p} = (1, 0), \mathbf{q} = (0, 1)$.

Question 1: Suppose $G$ has 2D symmetry. Does this imply that there is a distortion-minimizing embedding of $G$ with 2D symmetry?

Say that $G$ is transitive if "all points look the same:" there's an automorphism of $G$ mapping any vertex $u$ to any other desired vertex $v$. Similarly, say that an embedding $F$ of $G$ is transitive, if translating the plane by any vector of form $(F(u) - F(v))$ induces an automorphism of $G$. (The usual embedding of $G_{8N}$ is transitive.)

Question 2: Suppose $G$ has 2D symmetry and is transitive. Is there a distortion-minimizing, transitive embedding of $G$ with 2D symmetry?

Question 3: Suppose $G$ has 2D symmetry, and is presented to us (in the natural way, by a finite description of a repeating "cell"). What is the complexity of determining the minimum distortion of any embedding of $G$? What about the case where $G$ is also transitive?

It seems clear that the answers to Questions 1 and 2 are highly relevant to Question 3.

Graph Metrics and Movement

I want to shift focus to another type of question suggested by $G_{8N}$. Let's back up a bit, and think about the familiar 4-neighborhood graph $G_{4N}$. It's not hard to see that the minimum-distortion embedding of $G_{4N}$ also has distortion $\sqrt{2}$. (You have to blow up the grid by a $\sqrt{2}$ factor to make the diagonals long enough.) Yet $G_{4N}$ seems considerably more natural as a discrete representation of movement in the plane somehow. Why?

I think the answer is that, with the usual embedding of $G_{4N}$, the graph distances $d_G(u, v)$ correspond to actual Euclidean travel-distances, under the restricted form of paths in which we confine ourselves to the line segments between vertices. (You can see why this metric is sometimes called "taxicab geometry.") By contrast, the usual embedding of $G_{8N}$ doesn't have this interpretation.

However, consider the following system of paths connecting points in the plane:

If we restrict ourselves to these paths, and if we make those squiggles the right length, then shortest Euclidean travel-distances actually do correspond to distances in the graph $G_{8N}$! This is so, even if we're allowed to switch paths at the crossover points.

So $G_{8N}$ is not totally weird as a discrete model of movement in the plane; it just corresponds to an odder restriction of movement.

More generally, say that a graph $G$, with nonnegative edge-weights ("lengths"), is an obstructed-plane graph, if there is an embedding of $G$ into $\mathbf{R}^2$ along with a set of "obstructions" (just a point-set in $\mathbf{R}^2$), such that shortest paths in $G$ correspond to shortest obstruction-avoiding paths in $\mathbf{R}^2$.

Question 4: What is the complexity of deciding whether a given graph (finite, say) is an obstructed-plane graph?

It simplifies things a bit to realize that, in trying to find an obstructed-plane realization of a graph $G$, the obstructions may as well be all of the plane except the intended shortest paths between all pairs of points. Using this observation, we can at least show that our problem is in NP. Is it NP-complete?

Any planar graph, with arbitrary nonnegative edge-weights, is clearly an obstructed-plane graph. But we've seen that $G_{8N}$, a non-planar graph, is also an obstructed-plane graph. (Quick---prove that $G_{8N}$ is non-planar!) The essence of the problem is to find systems of paths in the plane which, though they may cross, do not introduce any undesired "short-cuts" between vertices.

Now suppose we draw $G$ in the plane, along with a collection of "intended" shortest paths between each vertices. (That is, we will obstruct the rest of the plane, and hope that these paths are indeed shortest in what remains.) We expect that the intended $u$-$v$ path is of Euclidean length $d_G(u, v)$.

A simple observation is that in order to avoid short-cuts, all 4-tuples of distinct vertices $u, u', v, v'$ must obey the following property:

$\bullet$ If the "intended path" from $u$ to $v$ intersects the intended path from $u'$ to $v'$, then

\[ d_G(u, v) + d_G(u', v') \geq d_G(u, u') + d_G(v, v') \]

and

\[ d_G(u, v) + d_G(u', v') \geq d_G(u, v') + d_G(u', v) . \]

Question 5: Is the necessary condition above also sufficient?

In a narrow sense, the answer to Question 5 is No: it's possible to draw a graph in this way and still introduce undesired short-cuts. My real question is whether, from a graph drawing with the property above, we can lengthen and contract the lengths of segments, without changing the topological structure of the drawing, in order to get the desired obstructed-plane realization.

It may be foolish to hope for such a simple condition to be sufficient. Also, an affirmative answer to Question 5 wouldn't seem to imply any new complexity upper bound for our problem (except perhaps to speed up the NP verification a bit). I ask only because I find the question interesting, and wasn't able to cook up any counterexamples in my brief attempt.

Joint computational complexity, and the "buy-one-get-one-free conjecture"

2011-06-15T10:21:00.001-04:00

Below is a simple-to-state open question, stemming from this paper of mine from CCC'09. First, I'll state the question; then I'll give some background, explaining how it's an instance of a more general and significant problem.

The question

Let's consider the standard two-party model of communication complexity. Given inputs x and y to Alice and Bob respectively, suppose there are 3 functions the two parties are interested in evaluating on these inputs---let's call them F(x, y), G(x, y), H(x, y).

Question: is there a collection of total functions F, G, H, and a positive value T, such that:

(i) any one of F, G, H requires at least T bits of communication to compute;

(ii) any two of F, G, H can be computed in (1.01 T) bits of communication, on a common input (x, y);

(iii) but, computing all three of F, G, H on a common input requires at least (1.99 T) bits of communication.

I believe such a collection exists. We can call this the 'buy-one-get-one-free conjecture': Think of T as the individual 'price' of the 'items' F, G, H; we want to arrange a special 'deal' where the second item is essentially free, but one has to pay full-price for the third item.

Now if you think about it, what we're looking for is pretty strange. The function F should be efficiently computable in at least two 'essentially different' ways---one of which also gives us the value of G, and one of which gives H---yet there should be no efficient scheme to compute F that gives us G and H simultaneously. (This property seems easier to contrive when the inputs x, y are assumed to have a special, correlated form; I rule this out by insisting that F, G, H be total functions.)

The question makes equal sense when posed for other models of computation. In my paper, I proved the corresponding conjecture in the decision tree model of computation, as a special case of a more general result--see below. Communication complexity could be a reasonable model to attack next.

Please note: While this conjecture may require lower-bounds expertise to resolve, I believe that anyone with a creative spark could make an important contribution, by coming up with a good set of candidate functions F, G, H. Please feel encouraged to share any ideas you might have.

Background on the question

Let cc(F) denote the (deterministic) communication complexity of computing F(x, y). Next, let cc(F, G) denote the communication complexity of computing F(x, y) and G(x, y)---on the same input-pair (x, y). We define cc(F, H), cc(G, H), and cc(F, G, H) similarly.

Together, we think of these various quantities as summarizing the 'joint complexity' of the collection F, G, H. Of course, this notion can be extended to collections of k > 3 functions; the joint complexity is summarized by giving the communication complexity of all 2^k subsets of the collection. Let's let JC denote the function that takes as input a k-bit vector, and returns the complexity of computing the corresponding subcollection. So, in our 3-function example, we have

JC(1, 1, 0) = cc(F, G) and JC(0, 0, 1) = cc(H).

The question we want to ask is: what kinds of behavior are possible with the joint complexity, if we allow the functions F, G, H, etc. to be chosen arbitrarily? In other words, what different types of 'efficiencies' can arise in a collection of computational tasks (in the communication model)?

A little thought reveals some obvious constraints:

1. the joint complexity function JC must always be nonnegative and integral-valued, with JC(0) = 0.

2. monotonicity: Enlarging the subset of the functions to be computed cannot decrease the complexity. For example, we always have cc(F, G) >= cc(F), which translates to JC(1, 1, 0) >= JC(1, 0, 0).

3. subadditivity: Taking the union of two subsets of functions to be computed cannot increase the complexity beyond the sum of the individual complexities of the subsets. For example, cc(F, G, H) <= cc(F, G) + cc(H), since we can always compute (F, G) in an optimal fashion first, then compute H optimally afterwards.

(Technically, this assumes that in our model both players always know when a communication protocol halts, so that they can combine two protocols sequentially without any additional overhead. No big deal, though.)

Now, a little further thought reveals that… well, there really aren't any other obvious, general constraints on the joint complexity! Let's call C an Economic Cost Function (ECF) if it obeys constraints 1-3. We are tempted to conjecture that perhaps every ECF is in fact equal to the joint complexity (in the communication model) of some particular collection of functions.

There are two things wrong with this conjecture. First, it's false, as can be seen by a simple counterexample: namely, the "buy-one-get-one-free" example, with T set to 1. That's how I stumbled onto this example, and is one reason why I find it interesting.

However, if we relax the problem, and just ask to realize some scalar multiple of C as a joint complexity function, this counterexample loses its force.

The second thing wrong with the conjecture (in its relaxed form) is that, even if true, it'd likely be impossible to prove. This is because determining the exact computational cost of even modestly-complicated tasks is just way too hard. So I propose a doubly-relaxed form of the conjecture: I conjecture that if C is an ECF, then there is a joint complexity function that is a good pointwise approximation to some scalar multiple of C. (Here we allow a (1 +- eps) multiplicative error.)

In my paper, I managed to prove the corresponding conjecture for the model of decision trees (aka deterministic query algorithms). Several interesting ingredients were needed for the proof. Now, why do I believe the conjecture should also hold true for the communication model? In a nutshell, I think it should be possible to 'embed' tasks in the query model into the communication model, by a suitable distributed encoding of each bit, in such a way that the relative costs of all computational tasks are approximately preserved. If this could be shown, the result in the communication model would follow from my result for decision trees. (See the paper for more details.)

We may not be ready for an attack on the general conjecture, however. In particular, we seem to require a much better understanding of so-called 'Direct Sum problems' in communication complexity. Thus, I offer the 'buy-one-get-one-free conjecture' as a simpler, more concrete problem on which we can hope to make progress sooner.

In the decision tree model, my result allows us to realize an ECF of the 'buy-one-get-one-free' type as a joint complexity function; but I don't know of any method for this that's significantly simpler than my general construction. Even finding such a simpler method in the decision tree model would be a very nice contribution, and might lead to new ideas for the more general problem.

An exciting new textbook, and a request

2011-05-09T10:53:00.000-04:00

Today I'd like to put out an appeal to readers. If you have a solid grasp of English and an interest in circuit complexity (no expertise required!), please consider helping proofread the forthcoming book "Boolean Function Complexity: Advances and Frontiers" by Stasys Jukna.

Stasys (whom I recently had the pleasure to meet at an enjoyable Dagstuhl seminar in Germany) is a talented researcher and a kind, gracious person; he's also worked tirelessly to produce high-quality textbooks for our community. I'm a long-time fan of his "Extremal Combinatorics: With Applications in Computer Science" which contains many gems of combinatorial reasoning in complexity theory.

His latest manuscript, to be published soon, promises to be a leading reference for circuit lower-bounds research. Although I've only read parts, it clearly achieves both depth and breadth; I really think anyone in the field could learn something new and useful here.

The main cause for concern is that Stasys (who hails from Lithuania) is not a native English speaker, and the text needs work in places to become grammatically correct and idiomatic. Also, it seems a full-time copy-editor is not available for this project.

So readers: by volunteering to proofread a chapter or two, you'll be doing a valuable service for present and future students of complexity theory. Language editing is the essential thing--you can skim over the equations, so it really doesn't take that long. (Of course, mathematical feedback is also welcome.)

The manuscript is available here; use the following login info:

User: friend
Password: catchthecat

You can email Stasys your comments. The text is long (500+ pages), so if you do or are planning to do some editing, please enter your name, the chapters you'll edit, and a timeframe in the comments below. This will allow others to maximize our coverage of the draft.

Thanks in advance for your help!

Harassment Policies for Theory Conferences

2010-12-10T00:38:00.000-05:00

Following offline conversations and recent discussions on other blogs (hat-tip to Anna and David), I want to promote the Geek Feminism Blog initiative asking computing conferences to adopt explicit policies against sexual harassment. Bringing such policies to theory conferences that don't yet have them is an important step. (Note that this would mean putting them on conference websites and preparing conference staff. Just having some boilerplate document hidden somewhere on the IEEE or ACM websites is not enough.)

What is the value of such a policy? Geek Feminism provides a policy template whose intro spells it out well. Such a policy

"sets expectations for behavior at the conference. Simply having an anti-harassment policy can prevent harassment all by itself...

"...it encourages people to attend who have had bad experiences at other conferences...

"...it gives conference staff instructions on how to handle harassment quickly, with the minimum amount of disruption or bad press for your conference."

Stating such a policy would cost nothing, and local conference staff could prepare for their roles using anti-harassment training materials, which abound on the web -- I invite others to suggest good ones.

So why would we hesitate to adopt such policies? I will suggest three possible reasons, and explain why they're unconvincing.

First, there is a certain tendency to deride anti-harassment training as "sensitivity training" and as stating the obvious. But whether or not most of us know how to treat others respectfully, responding to disrespectful treatment is another story. Conference staff need to know there are circumstances under which they can and should reprimand attendees or even eject them, and they need to mentally rehearse for these difficult tasks. Attendees need to know the staff are ready to help.

Second, some might object that while harassment may be a major problem in other parts of the computing/tech world, it's less of a problem in our mature, enlightened theory community. Of course, this would be a self-serving belief without empirical support. I'm not aware of any systematic efforts to track harassment incidents at theory conferences, although Geek Feminism maintains wiki record of incidents in computing/tech more broadly -- I hope theory conference-goers will find and use it or something similar. But if we can agree that sexual harassment is seriously wrong -- harmful to individuals and the community when it occurs -- surely we can take the time to state this publicly and prepare ourselves to deal with it, whatever its frequency.

Third, might an anti-harassment policy inhibit our freedom of expression too much or make people afraid to interact? Let me turn this question around. Almost all universities and major employers have explicit anti-harassment policies (here's MIT's, for example). Most of us support these precautions and don't feel oppressed by the policies. Why should conferences, which are outgrowths of the academic system, be different? Do we believe there is some special spirit of lawlessness that we need to protect at conferences, and only at conferences?

Of course not. So I support the harassment-policy initiative, and encourage others to do so as well.

ECCC: what authors should know

2010-11-04T13:38:00.000-04:00

Anyone interested in computational complexity should be aware of ECCC, the most important and widely-used online repository for complexity papers. (Depending on your specific interests, various sections of arxiv, along with the Cryptology eprint Archive, may be equally important to follow.)

Unfortunately, the technical side of ECCC's submission process is arguably broken, and seems to trip up many if not most authors. Here are the issues I'm aware of:

1: no preview function.

Unlike arxiv, ECCC offers Latex support for on-site abstracts, so you can have as many funky symbols as you want. Good thing? No, because the site offers no way to preview what the compiled math will look like. This results in bizarre spacing effects and compile errors. (The most frequent problem, I think, is authors trying to use their own custom macros in the abstract, or commands from outside the Latex fragment supported by the site.)

Nor is it possible to preview what your document will look like (assuming it's accepted). This brings us to the second point:

2: hyperrefs are broken.

Many authors these days like to use internal hyperlinks in their document (provided by the hyperref package in Latex). This way, in case the reader forgets what Lemma 14.5 said, there's a link to it every time it's invoked. ECCC is happy to accept hyperref'd papers, and when they appear on the site, they'll have the same appearance you've chosen to give them. Unfortunately, in most cases the damn things won't do anything when you click on them.

Faulkner wanted to print The Sound and the Fury in multi-colored ink. I happen to like the look of colorful hyperrefs, even broken ones, but I still feel like a fool when they're all over a paper of mine.

3: keywords are nearly useless.

There is so little standardization in the use of keywords, and they're handled so rigidly, that clicking on them is often a waste of time. For example, the keywords 'algorithmic meta-theorems' and 'algorithmic meta theorems' bring up one paper each -- two lonely souls separated by a hyphen. (The search tool and the browse-able list of keywords are somewhat more useful, but still probably less so than googling.) Mistyped keywords are another danger. I've also seen author names that, when clicked, bring up a proper subset of that author's work for no apparent reason.

Why does this matter?

I think all this constitutes a serious problem. But one possible objection to my view is that authors can always post revisions to their papers -- fixing abstracts, documents, and keywords in one stroke.

This might be an OK solution, except for the empirical fact that almost nobody does this (myself included). I think, and others have also opined, that this is because people are afraid to revise. Presumably, they fear there's a widespread perception that posting revisions = mistakes or sloppiness.

Whether or not this perception is actually widespread, we should stand visibly against it, to loosen the grip of pointless anxieties and to improve the quality of available papers. A more reasonable attitude is that having early access to preprints is a good thing, but that such papers will almost always have imperfections. Revising a paper within a few weeks or months of its release ought to be a sign of conscientious authors, not sloppy ones. This holds doubly in the context of a messed-up submission system.
Of course, there is such a thing as too many revisions, and it is possible to post a paper too early in the editing process. Where that line should be drawn is a tough topic that deserves its own discussion.

What's the solution?

We can and should expect more from ECCC. Specifically, a preview function for abstracts and documents would be a key improvement. But at the least, there should be a clear list of warnings to authors about common submission errors.

The web administrators are aware of these issues, and have been for some time; but we as users can each do our part to communicate the importance and urgency of fixing these problems. This is especially true of users on the site's scientific board.

In the meantime, what should you do if your submission doesn't turn out the way you expected? Last time this happened to me, I contacted a web admin, a friendly guy who was able to fix part of the problem for me, without resorting to the dreaded revision step. This might work for you as well.

Injective polynomials

2010-07-21T19:27:00.000-04:00

From a paper of MIT's Bjorn Poonen, I learned of an amazingly simple open problem. I'll just quote the paper (here Q denotes the rational numbers):

"Harvey Friedman asked whether there exists a polynomial f(x, y) in Q(x, y) such that the induced map Q x Q --> Q is injective. Heuristics suggest that most sufficiently complicated polynomials should do the trick. Don Zagier has speculated that a polynomial as simple as x^7 + 3y^7 might be an example. But it seems very difficult to prove that any polynomial works. Both Friedman's question and Zagier's speculation are at least a decade old... but it seems that there has been essentially no progress on the question so far."

Poonen shows that a certain other, more widely-studied hypothesis implies that such a polynomial exists. Of course, such a polynomial does not exist if we replace Q by R, the reals. In fact any injection R x R --> R must be (very) discontinuous.

Suppose an injective polynomial f could be identified, answering Friedman's question; it might then be interesting to look at `recovery procedures' to produce x, y given f(x, y). We can't hope for x, y to be determined as polynomials in f(x, y), but maybe an explicit, fast-converging power series or some similar recipe could be found.

Finally, all of this should be compared with the study of injective polynomials from the Euclidean plane to itself; this is the subject of the famous Jacobian Conjecture. See Dick Lipton's excellent post for more information.

Wit and Wisdom from Kolmogorov

2009-08-10T11:29:00.000-04:00

I just learned about a result of Kolmogorov from the '50s that ought to interest TCS fans. Consider circuits operating on real-variable inputs, built from the following gates:

-sum gates, of unbounded fanin;
-arbitrary continuous functions of a single variable.

How expressive are such circuits? Kolmogorov informs us that they can compute any continuous function on n variables. Wow! In fact, one can achieve this with poly(n)-sized, constant-depth formulas alternating between sums and univariate continuous functions (two layers of each type, with the final output being a sum).

Note that sums can also be computed in a tree of fanin-two sums, so this theorem (whose proof I've not yet seen) tells us that fanin-two continuous operations capture continuous functions in the same way that fanin-two Boolean operations capture Boolean computation (actually, even more strongly in light of the poly-size aspect).

This result (strengthening earlier forms found by Kolmogorov and by his student Arnol'd) may, or may not, negatively answer one of Hilbert's problems, the thirteenth--it's not entirely clear even to the experts what Hilbert had intended to ask. But a fun source to learn about this material is a survey/memoir written by A. G. Vitushkin, a former student of Kolmogorov. [a gated document, unfortunately...]

The freewheeling article starts off with Hilbert's problem, but also contains interesting anecdotes about Kolmogorov and the academic scene he presided over in Moscow. Towards the end we get some amusing partisan invective against Claude Shannon, who, scandalously, "entirely stopped his research at an early stage and still kept his position of professor at the Massachusetts Institute of Technology". Vitushkin (who passed away in 2004) apparently bore a grudge over the insufficient recognition of Soviet contributions to information theory. My favorite story relates a not-so-successful visit by Shannon to meet Kolmogorov and features a deft mathematical put-down:

"Kolmogorov was fluent in French and German and read English well, but his spoken English was not very good. Shannon, with some sympathy, expressed his regret that they could not understand each other well. Kolmogorov replied that there were five international languages, he could speak three of them, and, if his interlocutor were also able to speak three languages, then they would have no problems."

Additive combinatorics: a request for information

2009-05-20T10:22:00.000-04:00

Given a subset A of the integers mod N, we can ask, how many 4-element `patterns' appear in A? A pattern is an equivalence class of size-4 subsets of A, where two 4-sets S, S' are considered the same pattern if S = S' + j (mod N) for some j.

Clearly the number of patterns is at most |A|-choose-4; but it can be much less: if A is a consecutive block, or more generally an arithmetic progression, the number of patterns is on the order of |A|^3.

So my question is: if the number of patterns is `much less' then |A|^4, what nice structure do we necessarily find in A?

I believe that similar questions for 2-patterns have satisfactory answers: then the hypothesis is just that the difference-set (A - A) is small. In this case I believe A is 'close' to a (generalized) arithmetic progression, although actually I'm having trouble finding the relevant theorem here too (most references focus on sumsets (A + A), for which Frieman's Theorem applies).

Thanks in advance for any pointers!

Algebraic and Transcendental

2009-03-24T15:45:00.000-04:00

A number is called algebraic if it is the root of a nonzero polynomial with integral coefficients (or, equivalently, rational coefficients); otherwise it is transcendental.

Item: there exists a nonintegral c > 1 such that c^n 'converges to the integers': that is, for any eps > 0 there's an N > 0 such that c^n is within eps of some integer, for all n > N. (I think the golden mean was an example, but I can't find or remember the reference book at the moment.)

Item: it is apparently open whether any such number can be transcendental.

***

Next up, two questions of my own on algebraic numbers. Possibly easy, possibly silly, but I don't know the answers.

Define the interleave of two numbers

b = 0.b_1b_2b_3... and c = 0.c_1c_2c_3...
(both in [0, 1], and using the 'correct' binary expansions) as

b@c = 0.b_1c_1b_2c_2...

1) Suppose b, c are algebraic. Must b@c be algebraic?

2) The other direction. Suppose b@c is algebraic. Must b, c be algebraic?

In both cases, I am inclined towards doubt.

***

Final thoughts (and these are observations others have made as well)... computational complexity theory seems to have certain things in common with transcendental number theory:
-an interest in impossibility/inexpressibility results;

-markedly slow progress on seemingly basic
questions--is (pi + e) irrational? does P = NP?

-attempts to use statistical and otherwise 'constructive' properties as sieves to distinguish simple objects from complicated ones.

So, could complexity theorists benefit from interacting and learning from transcendental number theorists? If nothing else, looking back on their long historical march might teach us patience and an appreciation for incremental progress.

A Rather Elegant Solution

2008-12-03T17:53:00.000-05:00

So I'm co-writing a survey paper for a course project, and struggling with the conflicting demands of completeness and brevity. As if charmed, while taking a break I happen upon a '99 paper called "50 years of Bailey's Lemma" by S. Warnaar whose abstract really resonates:

"Half a century ago, The Proceedings of the London Mathematical Society published W. N. Bailey’s influential paper Identities of the Rogers–Ramanujan type... To celebrate the occasion of the lemma’s fiftieth birthday we present a history of Bailey’s lemma in 5 chapters...
Due to size limitations of this paper the higher rank [42, 40, 43, 41, 14, 60] and trinomial [11, 59, 19] generalizations of the Bailey lemma will be treated at the lemma’s centennial in 2049."

Excitement and Probability

2008-10-31T10:36:00.001-04:00

Elections, sporting events, and other competitions can be exciting. But there is also a sense in which they are almost always dull, and this can be proved rigorously. Allow me to explain.

(What follows is an idea I hatched at UCSD and described to Russell Impagliazzo, who had, it turned out, discovered it earlier with some collaborators, for very different reasons. I wouldn't be surprised if it had been observed by others as well. The proof given here is similar to my original one, but closer to the more elegant one Russell showed me, and I'll cite the paper when I find it.)

I want to suggest that a competition is dull to watch if one side is always heavily favored to win (regardless of whether it's the side we support), or if the two sides muddle along in a dead heat until some single deciding event happens. More generally, I'd like to suggest that excitement occurs only when there is a shift in our subjective probabilities of the two (let's say just two) outcomes.

Without defending this suggestion, I'll build a model around it. If anyone is upset that this notion of excitement doesn't include the smell of popcorn, the seventh-inning stretch, or any substantive modelling of the competition itself, there's no need to read any further.

Assume we are a rational Bayesian spectator watching a competition unfold, and that we receive a regular, discrete sequence of update information $X_1, X_2, ... X_n$ about a competition (these update variables could be bits, real numbers, etc.). Let $p_0, p_1, ... p_n$ be our subjective probabilities of 'victory' (fixing an outcome we prefer between the two) at each stage, where $p_t$ is a random variable conditioning on all the information $X_1, ... X_t$ we've received at time $t$.

For $t > 0$, let's define the excitement at time $t$ as $EXC_t = |p_{t} - p_{t - 1}|$. This random variable measures the 'jolt' we presume we'll get by the revision of our subjective probabilities on the $t$th update.

Define the total excitement $EXC$ as the sum of all $EXC_t$ 's. Now, we only want to watch this competition in the first place if the expected total excitement is high; so it's natural to ask, how high can it be?

We needn't assume that our method for updating our subjective probability corresponds to the 'true' probabilities implied by the best possible understanding of the data. But let's assume it conforms at least internally to Bayesian norms: in particular, we should have $E[p_{t + 1} | p_{t} = p] = p$.

An immediate corollary of this assumption, which will be useful, is that

\[E[p_t p_{t + 1}] = \sum_p Prob[p_t = p]\cdot p E[p_{t + 1}|p_t = p]\]
\[= \sum_p Prob[p_t = p]\cdot p^2 = E[p_t^2].\]
OK, now rather than look at the expected total excitement, let's look at the expected sum of squared excitements, an often-useful trick which allows us to get rid of those annoying absolute value signs:
\[E[EXC_1^2 +EXC_2^2 + \ldots + EXC_{n }^2]\]

\[= E[(p_1 - p_0)^2 + \ldots + (p_n - p_{n - 1})^2 ] \]

\[= E[p_0^2 + p_n^2 + 2\left(p_1^2 + \ldots + p_{n - 1}^2\right) \]

\[ \quad{}- 2\left(p_1 p_0 + p_2 p_1 + \ldots + p_n p_{n - 1} \right) ] \]

\[= E[p_0^2] + E[p_n^2] + 2(E[p_1^2] + \ldots + E[p_{n - 1}^2)] \]

\[ \quad{} - 2(E[p_0^2] + \ldots + E[p_{n - 1}^2] )\]

(using linearity of expectation and our previous corollary). Now we get a bunch of cancellation, leaving us with

\[= E[p_n^2] - E[p_0^2].\]

This is at most 1. So if we measured excitement at each step by squaring the shift in subjective probabilities, we'd only expect a constant amount of excitement, no matter how long the game!

Now, rather crudely, if $Y \geq 0$ and $E[Y] \leq 1$ then $E[\sqrt{Y}] \leq 2$. We also have the general '$\ell_1$ vs $\ell_2$' inequality
\[|Y_1| + ... + |Y_n| \leq \sqrt{ n \cdot (Y_1^2 + ... + Y_n^2)} .\]

Using both of these, we conclude that
\[E[EXC_1 + \ldots + EXC_n] \leq E\left[\sqrt{n \cdot \left(EXC_1^2 + \ldots + EXC_n^2\right)} \right] \leq 2 \cdot \sqrt{n} .\]
Thus we expect at most $2 \sqrt{n}$ total excitement, for an expected 'amortized excitement' of at most $2/\sqrt{n} = o(1)$.

Watch $n$ innings, for only $O(\sqrt{n})$ excitement? Give me break! If $n$ is large, it's better to stay home--no matter what the game.

I would love to see this theory tested against prediction markets like Intrade, which are argued to give a running snapshot of our collective subjective probability of various events. Are the histories as 'low-excitement' as our argument predicts? Even lower? (Nothing we've said rules it out, although one can exhibit simple games which have expected excitement on the order of $\sqrt{n}$.)

And if the histories exhibit more excitement than we'd predict (some sign of collective irrationality, perhaps), is there a systematic way to take advantage of this in the associated betting market? Food for thought. I'd be grateful if anyone knew where to get a record of Intrade's raw day-by-day numbers.

Finally, nothing said above rules out individual games playing out with high excitement, on the order of $n$, but it does say that such outcomes should be infrequent. I believe a more careful martingale approach would show an exponentially small possibility of such large deviations (Russell said their original proof used Azuma's inequality, which would probably suffice).

NO on Prop 8

2008-10-24T14:08:00.000-04:00

If you are straight, would you join a club that disallowed gay people? Or keep your membership in a club that stopped admitting them? Would you feel distinguished by your membership? To the contrary, I think most people would feel embarrassed and cheapened by it.

That's why everyone who is or hopes to get married in California (or anywhere, really) should feel alarmed about Proposition 8, why everyone whose tax dollars fund the Marriage Club should feel affronted by this attempt to make it an exclusionary one (by amending the state constitution). Even though we also fund the similar and inclusive Civil-Union Club next door (at least, until the next wacky voter initiative comes along), who can ignore the fence-builders' zeal for insisting on this petty distinction for heterosexual couples, or fail to grasp its underlying message?

Nothing in the existing laws force clergy of any religion to give ceremonies or `recognize' marriages they don't accept. There remains, in fact, plenty of space in private life to speak and practice intolerance, but we can't let it be done in the name of all Californians.

For thoughtful posts on the subject, see e.g. Luca's, Ben Casnocha's, and the No on Prop 8 website. They are outspent by the opposition and need help to run TV spots up thru the election, to sway what seems like a very volatile public opinion on this issue.
For an amazing photo-essay on California's ever-expanding diversity, and a powerful argument for mutual acceptance and respect, check out the book Under the Dragon. (Hat-tip to Chaya!)

David Foster Wallace

2008-09-14T13:23:00.000-04:00

About a week ago, at great risk to my studies, I gave in to temptation and checked out 'Infinite Jest' from the library. Last night, ~300 pages into rereading this wonderful novel, I learned that David Foster Wallace, its author, has died in an apparent suicide.

I never met DFW, and know little about his personal life--probably as he wished it--although he seems to have been widely regarded as a kind and generous person. I also never connected too closely with his short fiction and journalistic writing, although I read most of it eagerly. My relationship to DFW centered on `Infinite Jest' (1996), an immense book that captivated me when I discovered it in high school.

Briefly, IJ consists of about 1000 pages of chronologically free-form and narratively heterogeneous episodes from the lives of several characters, generally connected either to the Enfield Tennis Academy of metro Boston or to the nearby Ennet House, a drug rehabilitation center. It is supplemented with ~200 pages of footnotes--a generous helping of asides, extra scenes, and background info (it's set in the slight future, culminating around 2008, better known as the Year of the Depend Adult Undergarment in the era of Subsidised Time).

IJ is at once:

-a lush entertainment, addictive in ways I can't fully explain;

-a barrage of observation, alternately expansive and minute, in which the struggle for readers and characters alike is not so much to find meaning as to hold on to it in the face of various compulsions and distractions, to exercise discernment in a world of spectacular banalities and banal truths;

-a compendium of contemporary striving and suffering, in turns putting up for scrutiny: pleasure and addiction, competitive pursuit, narcissism and dismorphic thinking, irony/withdrawal as survival strategies in a surreal political climate... and more, all in memorably original fashion;

-a genuinely moving book, never dominated by its theses or formal experiments, with deeply rendered characters who, despite their glaring and costly mistakes along the way, become friends you wish would hang around for another 1000 pages.

It's a huge loss to learn that David Foster Wallace won't publish a follow-up to IJ. It's a blow to learn that the person who produced such a sustained meditation on suffering (and our resources to overcome it) has taken his own life. It's a sadness to know that the spirit that breathed into those pages has passed.

What's left is his remarkable work, and his readership, which I hope will continue to grow. Pick up 'Infinite Jest' today!

Update: Arts & Letters Daily has collected a number of DFW retrospectives (see 'Essays and Opinion'). This site, by the way, is an excellent aggregator of new and noteworthy online writing.

Heh...

2008-07-10T17:50:00.000-04:00

Funny web comic touching on complexity theory.

From Request Comics. Handed across the room by Madhu to Brendan, who showed it to me.

Earlier I'd seen another, slightly more reverent, comics treatment of Interactive Proofs, by Larry Gonick of 'Cartoon History of the Universe' fame.

To briefly discuss the Request Comics scenario: suppose an alien claims to play perfect chess (in all positions) on an n-by-n board; is it possible to efficiently test this in a poly(n)-length randomized interaction?

If there's only one alien, this might be too hard, since the 'correct' generalization of Chess is EXPTIME-complete. If we instead play a PSPACE-complete game like Hex (or Chess with a truncated game length of poly(n)), things change: Shamir's Theorem tells us how a Verifier can be convinced that any particular board set-up is a win.

But this is not the same as Prover convincing Verifier that Prover plays perfectly! However, additional ideas can help. Feigenbaum and Fortnow showed that every PSPACE-complete language L has a worst-case-to-average case reduction to some sampleable distribution D on instances.

This means there exists a randomized algorithm A^B using a black-box subroutine B, such that for all problem instances x, and for all possible black-boxes B that correctly answer queries of form 'is y in L?' with high probability when y is drawn according to distribution D,

A^B(x) accepts with high prob. if x is in L, and rejects w.h.p. if x is not in L.

Thus for the Prover to convince Verifier that Prover is 'effectively able' to solve L in the worst case (and L may encode how to play perfect Hex), it's enough to prove that Prover can decide L w.h.p. over D. Since D is sampleable, Verifier may draw a sample y from D, and the two can run an interactive proof for L on it (or L-complement, if Prover claims y isn't in L). Repeat to increase soundness.

A Must-See Site

2008-05-23T12:53:00.000-04:00

It's called Theorem of the Day. I just found it, so I can't judge how accurate the frequency claim is, but Robin Whitty has stacked up an impressive collection of short, illustrated introductions to major theorems. I like his taste... two that were news to me and I especially liked in my first sampling: Nevanlinna's Five-Value Theorem and The Analyst's Traveling Salesman Problem.

Sign Robin's guestbook, spread the word, and recommend a theorem of CS theory! (I see at least 3 already, if you count unsolvability of Diophantine equations.)

Random bits

2008-05-15T14:43:00.000-04:00

This morning I was delighted to learn that Alan Baker, the Swarthmore professor whose Philosophy of Science class I took senior year, recently won the US Shogi championships--read his account here. Congrats, Prof. Baker!
I never learned Shogi, although Go--another Asian board game--was a big part of my youth and probably a decisive factor in my eventual interest in math/CS.

In other news, I am coauthor on a paper, 'The Power of Unentanglement', recently released on ECCC (and headed for CCC '08), jointly with Scott Aaronson, Salman Beigi, Bill Fefferman, and Peter Shor. It's about multiple-prover, single-round, quantum interactive proofs, and I encourage those whose cup of tea that is to take a look.

This is my first appearance in a conference paper, but not quite my first time in print--it happened once before, by accident. As a freshman in college, I asked my Linear Algebra professor, Steven Maurer, a question on an online class discussion board. Here it is, in the breathless and overwrought prose of my teenage years:

"This question has been haunting me, and I know I shouldn't expect definite answers. But how do mathematicians know when a theory is more or less done? Is it when they've reached a systematic classification theorem or a computational method for the objects they were looking for? Do they typically begin with ambitions as to the capabilities they'd like to achieve? I suppose there's nuanced interaction here, for instance, in seeking theoretical comprehension of vector spaces we find that these spaces can be characterized by possibly finite 'basis' sets. Does this lead us to want to construct algorithmically these new ensembles whose existence we weren't aware of to begin with? Or, pessimistically, do the results just start petering out, either because the 'interesting' ones are exhausted or because as we push out into theorem-space it becomes too wild and wooly to reward our efforts? Are there more compelling things to discover about vector spaces in general, or do we need to start scrutinizing specific vector spaces for neat quirks--or introduce additional structure into our axioms (or definitions): dot products, angles, magnitudes, etc.?

Also, how strong or detailed is the typical mathematician's sense of the openness or settledness of the various theories? And is there an alternative hypothesis I'm missing? "

Steve gave a thoughtful reply, years passed, and then a similar topic came up in coversation between himself and the mathematician and popular math writer Philip J. Davis. The conversation sparked an essay by Davis in which he quoted Maurer and I (with permission), the essay recently became part of a philosophy-of-math book ('Mathematics and Common Sense: A Case of Creative Tension'), and I got mailed a free copy--sweet! Once again, I recommend the book to anyone who enjoys that kind of thing. The essay is online.

Free on Friday?

2008-05-10T23:26:00.000-04:00

You should come to Johnny D's in Davis Square and hear No Static, a 9-or-10-piece Steely Dan tribute band (standard preemptive clarification: Steely Dan is the band name, not a person).

No Static plays regularly in the area--I saw them in the fall and they were excellent, channeling the Dan's recordings with amazing care and fidelity.

Like most great artists, Steely Dan affords a unique perspective on the world, one best imbibed by listening to multiple albums in their weird entirety. This is exactly No Static's approach, so if you don't know Steely Dan, or have only heard a few catchy radio numbers, here's your chance to get hooked. Hope to see you there!

Complexity Calisthenics (Part I)

2008-05-01T14:02:00.000-04:00

Today I want to describe and recommend a paper I quite enjoyed: The Computational Complexity of Universal Hashing by Mansour, Nisan, and Tiwari (henceforth MNT). I think that this paper, while not overly demanding technically, is likely to stretch readers' brains in several interesting directions at once.

As a motivation, consider the following question, which has vexed some of us since the third grade: why is multiplication so hard?

The algorithm we learn in school takes time quadratic in the bit-length of the input numbers. This is far from optimal, and inspired work over the years has brought the running time ever-closer to the conjecturally-optimal n log n; Martin Furer published a breakthrough in STOC 2007, and there may have been improvements since then. But compare this with addition, which can easily be performed with linear time and logarithmic space (simultaneously). Could we hope for anything similar? (I don't believe that any of the fast multiplication algorithms run in sublinear space, although I could be mistaken. Here is Wiki's summary of existing algorithms for arithmetic.)

As MNT observe, the question is made especially interesting when we observed that multiplication could be just as easily achieved in linear time/logspace... if we were willing to accept a different representation for our integer inputs! Namely, if we're given the prime factorizations of the inputs, we simply add corresponding exponents to determine the product. There are two hitches, though: first, we'd have to accept the promise that the prime factorizations given really do involve primes (it's not at all clear that we'd have the time/space to check this, even with the recent advances in primality testing); second, and more germane to our discussion, addition just got much harder!

The situation is similar over finite fields of prime order (Z_p): in standard representation, addition is easy and multiplication is less so, while if we represent numbers as powers of a fixed primitive root, the reverse becomes true. This suggests a woeful but intriguing possibility: perhaps no matter how we represent numbers, one of the two operations must be computationally complex, even though we have latitude to 'trade off' between + and *. So we are invited to consider

Mental Stretch #1: Can we prove 'representation-independent' complexity lower bounds?

Stretch #2: Can we prove such lower bounds in the form of tradeoffs between two component problems, as seems necessary here?

In the setting of finite-field arithmetic, MNT answer 'yes' to both problems. The lower bounds they give, however, are not expressed simply in terms of time usage or space usage, but instead as the product of these two measures. Thus we have

Stretch #3: Prove 'time-space tradeoffs' for computational problems.

To be clear, all three of these 'stretches' had been made in various forms in work prior to MNT; I'm just using their paper as a good example.

The combination of these three elements certainly makes the task seem daunting. But MNT have convinced me, and I hope to suggest to you, that with the right ideas it's not so hard. As the paper title indicates, their point of departure is Universal Hashing (UH)--an algorithmic technique in which finite fields have already proved useful. Use upper bounds to prove lower bounds. We can call this another Stretch, or call it wisdom of the ages, but it deserves to be stated.

So what is UH? Fix a domain D and a range R. a (finite) family H of functions from D to R is called a Universal Hash Family (UHF) if the following holds:

For every pair of distinct elements d, d' in D,

and for every pair of (not necessarily distinct) elements r, r' in R,

if we pick a function h at random from H,

Prob[h(d) = r, h(d') = r' ] = 1/|R|^2.

In other words, the randomly chosen h behaves just like a truly random function from D to R when we restrict attention to any two domain elements. (In typical applications we hope to save time and randomness, since H may be much smaller than the space of all functions.)

Here is what MNT do: they prove that implementing any UHF necessitates a complexity lower bound in the form of a time-space product. (To 'implement' a UHF H is to compute the function f_H(h, x) = h(x).)

This is in fact the main message of the paper, but they obtain our desired application to + and * as a corollary, by citing the well known fact that, fixing a prime field Z_p = D = R,

H = {h_{a, b}(x) = a*x + b mod p}

is a UHF, where a, b range over all field elements. (Left as an easy exercise.)

Note the gain from this perspective: implementing a UHF is a 'representation-invariant' property of a function, so Stretch 1 becomes possible. Moreover, Stretch 2 now makes more sense: it is only jointly that + and * define a UHF, so whatever complexity measure M we lower-bound for UHFs implies only a tradeoff between + and *.

It remains only to sketch the proof of time-space tradeoffs for UHFs, which in fact is a manageable argument along classic lines (the basic form of the argument is attributed to earlier work by Borodin and Cook). The upshot for us will be that for any program computing (any representation of) f(a, b, x) = a*x + b over an n-digit prime modulus, if T denotes worst-case time usage and S worst-case space, T*S = Omega(n^2). Very nice! (Although what this actually implies about the larger of the individual time-space products of + and * under this representation is not clear to me at the moment.)

Let's adjourn for today... wouldn't want to pull a muscle.

A simple plan to improve your graduate program

2008-04-16T19:50:00.000-04:00

It's just this: Food is key. We need more food. (In what follows, I'm speaking not just for MIT theory students, but for all students everywhere.)

Cancel the subscription to 'Journal of Timed Networked Multithreaded Aqueous Automata', and a few others. You've just saved about $20,000.

Use the money to provide copious snacks for students and faculty. Weekly receptions help, but really we're hungry all the time. To elaborate on that point:

-The student center is a tiresome 5-10 minutes away.

-Graduate students are low on cash. We work strange hours that discourage grocery shopping (and may not own a car). Some of us are newly weaned from the meal plans of our undergrad days, and we're only slowly learning to provide for ourselves. The food around here is expensive.

If department budget is truly an issue, there is another way, practiced with admirable success by UC San Diego's CSE department: recruit grad student volunteers to maintain a stocked snack-room, with foods purchased cheaply in bulk and paid for on the honor system. Of course, a snack-room should also be a social space.

Candy and tasty treats help, but it's too easy to over-rely on them and come crashing down. At some point we all wish there were less of these around the office. Consider in their stead:

-bagels
-raisins
-apples and bananas
-peanuts and peanut butter

...all cheap, real-tasting, and calorific.

That's it! An easy, cost-effective intervention that will keep students and faculty working happily in their offices on their next theorem or patentable device.

Some babies grow in a peculiar way

2008-04-04T22:23:00.000-04:00

Today I bumped into an order of growth I hadn't seen before, and thought I'd share it for a modest bit of mental aerobics.

Readers may well have seen functions of form

f(n) = (log n)^c,

known as 'polylogarithmic'. These are important in, e.g., query complexity, where we like the number of queries to be much smaller than the input size n when possible. Also emerging from such studies are the 'quasipolynomial' functions, of form

g(n) = 2^{(log n)^c}.

As a warmup--how fast do these things grow?

OK, now the main course tonight is the following:

h(n) = 2^{2^{(log log n)^c}}.

What do you make of these? And what questions need answering before we understand such a growth rate 'well enough'? I'm unsure and would love to hear your thoughts.

News, and some Number Theory

2008-03-21T14:30:00.000-04:00

Time for a personal update: I'm enjoying myself greatly here in Cambridge, and was recently admitted as a transfer student to MIT's EECS department. The move has both personal and academic advantages for me, but let me emphasize that I think UCSD has a lot to offer prospective students, and in addition is home to many cool people whom I miss... I would be happy to offer advice about either school.

When I started writing here as an undergrad, I knew very few people in the worlds of TCS and Complexity, and the blog was a way of reaching out to a community I wanted to join. Now, thanks in part to blogging (thru which I met Scott, my advisor), I'm immersed in that community at a school with a large, vibrant Theory group. This is to say that, though writing here still interests me, it no longer answers an urgent personal need, and I am most likely to post new material in response to a request for specific topics or post types.

Today I was perusing a favorite book of mine, 'Gems of Theoretical Computer Science' by U. Schoning & R. Pruim. One chapter partially treats the undecidability of determining whether systems of Diophantine equations (polynomial equations where solutions are required to be integral) have a solution. This was one of Hilbert's problems from 1900, solved in 1971 and full of deep math.

The authors pose the following exercise: show that the version where variables must take nonnegative values, is reducible to the integer version. (And vice-versa; also the rational-values version is reducible to the integral version; the converse appears to be open...)

Think about it...

The reduction they give: Given a set of polynomial equations {P_i(X_1, ... X_k)}, we want to determine if they're simultaneously satisfiable with nonnegative values. Introduce, for each j <= k, the variables Y_{j, 1}, Y_{j, 2}, Y_{j, 3}, Y_{j_4}.

Now add to the system, for each j <= k, the constraint that

X_j = Y_{j, 1}^2 + Y_{j, 2}^2 + Y_{j, 3}^2 + Y_{j_4}^2.

Claim: the new system is satisfiable over the integers iff the original one is satisfiable over the nonnegative integers! The proof, of course, is an easy consequence of Lagrange's Theorem that every nonnegative integer is the sum of 4 integer squares.

So, my question is, could a reduction be found that doesn't rely on Lagrange's Theorem? Or its weaker variants where the constant 4 is replaced with some constant c > 4. Or maybe for some constant c, the proof is really so simple that I will be satisfied that we are performing this reduction with the cheapest tools.

If our plan in the reduction is to constrain the original variables in a form analogous to the above, namely

X_j = Q_j(Y, Z, ...),

where Y, Z, ... are new integral variables, is there any way around proving a version of Lagrange's Theorem? Generally we show that polynomials are identically nonnegative by expressing them as a sum of one or more squares, e.g.,

Y^2 + Z^2 - 2YZ = (Y - Z)^2.

Using this template, we'd be reliant on Lagrange. However, in his thesis, Minkowski conjectured that there exist nonnegative real polynomials *not* so expressible, and this was proved by Hilbert. A simple example I found online is

Q(X, Y, Z) = X^4*Y^2 + Y^4*Z^2 + Z^4*Y^2 - 3X^2*Y^2*Z^2,

and another counterexample is derived systematically in the classy and useful inequalities book 'The Cauchy-Schwartz Master Class' by J.M. Steele. (Artin proved, however, that every poly that's identically nonnegative, is expressible as the sum of finitely many *rational* functions squared.)

Could it be that one of these creatures is easier to use in the reduction (both to prove it's nonnegative, and ranges over all nonnegative values)? Somehow I doubt it. Anyways, I just wanted to point to an instance where a reduction one would expect to be straightforward seems to require fairly nontrivial understanding of the problem domain.

Beasts of Probability and Plane Geometry

2008-03-07T16:43:00.000-05:00

Say you're trying to predict whether some event E occurs or not. There is another collection of events I_1, I_2, ... I_k, which are positive predictors of E: for every j, E occurs with probability at least .99 conditioning on the event that I_j occurs.

Can we lower-bound the probability that E occurs conditioning on the event that *at least one* I_j occurs?

(Think about it before reading on.)

Here's a simple example of what can go wrong: let the underlying probability space be a sequence of n unbiased coin flips. Let E be the event that at least 2/3 of the flips come up heads. For each subset S of {1, 2, ... n} of size exactly
.4n, let I_S be the event that all the coin flips indexed by S come up heads.

If n is large enough, we have that

i) E is almost surely false, yet

ii) Almost surely, some I_S is satisfied--even though

iii) Conditioning on any fixed event I_S, E becomes almost surely true (since we then expect half of the remaining flips to come up heads, yielding about a .4 + .5*.6 = .7 fraction of heads total).

One can also modify this example to make conditioning on the union of the I_j's actually decrease the probability that E occurs.

This kind of conclusion seems somewhat pathological and inconvenient, so it's natural to look for restrictions that prevent it from arising. The simplest would be to restrict the number, k, of predictor variables: for the above hypotheses, we have that the probability of E conditioning on the union of the I_j's is at least

.99 / (1 + .01(k - 1)).

(to see this, think about the worst possible case, which resembles a 'sunflower' in probability space.)

A more interesting direction is to restrict the structure of the predictor events within probability space. For instance, suppose that the probability space is a uniformly drawn point from the unit interval, E is some arbitrary subset of the interval, and each I_j is the indicator variable for some fixed subinterval. Then, regardless of the number of I_j, we can conclude that E occurs with high probability conditioning on the union of I_j's; not quite .99, but close. See my closely related earlier post for details.

It is natural to try to extend this result to higher dimensions, and for predictor indicator-sets given by axis-parallel rectangles this succeeds by an induction (although the effect gets exponentially weaker with the dimension). Similar results hold if the sets are required to be 'fat' convex bodies, in which case an argument much like the 1-dimensional one works.

However, allowing rotated rectangles destroys the effect even in two dimensions. Here's one interpretation: consider the unit square as a city, and take the set S to be the set of Democratic households.

In pathological cases, it's possible to find a covering of the square by overlapping convex 'precincts', such that

i) in each precinct, 99% of the households are Democratic, yet

ii) overall, 99% of houses are Republican!

Such sets seem truly bizarre. For a long time I was convinced they couldn't exist, but after failing to prove this, I finally tracked down a construction in a Harmonic Analysis textbook by Elias Stein (who, besides seeming impressive in his own right, was PhD advisor to Fields Medalists Charles Fefferman and Terence Tao). These sets, whose construction resembles a kind of hydra spawning ever more and tinier heads, are related to the more well-known Besicovitch/Kakeya sets. One can even achieve a kind of fantastical limit, in the following theorem for which Stein provides a reference:

There exists a subset S of measure zero in the unit square, such that for every point x on the square, there exists a line L thru x, such that S contains all of L except, possibly, x itself!

That's a Wrap

2007-12-24T23:34:00.000-05:00

I had some gift-giving duties to attend to today... halfway into the wrapping phase, my mind wandered in a predictable direction:

Given a set of rectangular box dimensions, what is the smallest amount of paper that will wrap the box?

One would assume we want to cut out a rectangle of wrapping paper, since otherwise we get annoying scraps (and also the problem is mathematically trivial if we can cut a cross-type shape).

I haven't had any time to toy with this problem, but I had a feeling one particular MIT dude might have... and I was right. True to form, Erik Demaine delivers a whole page about various wrapping problems he's explored with colleagues.

To anyone reading who teaches a programming course, I would suggest that a fun algorithms assignment could be spun out problems of the above type. If the program outputted folding instructions too, so much the better.

Happy holidays, all!

Cardinal Rules

2007-12-17T15:13:00.001-05:00

1. Pick your favorite countable set S. Let F be a 'nested' family of distinct subsets of S; that is, if A, B are members of F, then either A is contained in B or B is contained in A.

Then clearly F can be at most countable... right?

A puzzle from Bollobas' recent book.

2. Given a collection C of functions from N = {1, 2, 3, ...} to N, say that C is unbounded if for any function f (in C or not), there exists a function g in C such that g(i) > f(i) for infinitely many i.

Clearly the class C of all functions from N to N is unbounded. Also, standard diagonalization techniques tell us that no countable collection C can be unbounded (exercise).

The question then becomes: what is the smallest cardinality of any unbounded collection C? Assuming the Continuum Hypothesis is false, where does the threshold lie?

Fortunately or unfortunately, there seems to be little we can say about this issue within the standard axioms of set theory. Assuming CH is false, the threshold could be as low as the first uncountable cardinal, or as large as the continuum, or in between--this I learned from Jech's encyclopedic book Set Theory. There are many questions of this flavor, where the key construction (in our case, constructing a 'bounding function' for a given 'small' collection C) is easy to do in the countable setting, but essentially impossible to analyze when there are uncountably many requirements floating around.

The need to satisfy uncountable collections of requirements in a construction is so common that new axioms for set theory are put forward expressly to assert the possibility of doing so (under some restrictions that make the axioms plausible); 'Martin's Axiom' is an especially widely used axiom of this type. Kunen's book on set theory seems like a good reference for MA (though I'm just starting to learn about it).

Of course, the Continuum Hypothesis itself makes it easier to satisfy collections of requirements of size smaller than continuum, since such requirements are then at most countable! This leads to some bizarre constructions, the possibility of many of which stands or falls with the CH itself. The excellent book Problems and Theorems in Classical Set Theory gives many of these, and Bill Gasarch recently exposited one such application in Euclidean Ramsey theory. On the other hand, Martin's Axiom, which is independent of CH, allows set theorists to prove many interesting results while remaining more 'agnostic' about cardinality issues.

I think that set theory is a blast, and that its logical structure resonates with issues in computer science. But I'm far from expert on the subject, so if I've said anything inaccurate please let me know.