Thursday, March 26, 2009

A Classic Problem

Consider a doubly-ended queue with 1000 elements. Alice and Bob take turns to remove an element from either end of the queue. The player who removes a larger sum wins.

Who wins the game?

This problem appeared in IOI 1996 in Hungary, but it's a well-known puzzle by now.

Wednesday, March 25, 2009

Communication Complexity (II): Randomized, Distributional

Prelude: A while back I posted an introduction to communication complexity, in which I showed how to obtain deterministic lower bounds by analyzing monochromatic rectangles. That post is also available in PDF if you want to review it.

I will continue with a series of posts giving a "practical introduction" to communication complexity. This will hopefully be a good reference to whoever wants to work in the area. I will ignore as much theory as possible (I find the theory of communication complexity to be quite dry compared to the applications), and take you straight to proving interesting bounds.

Episode 2. This episode will set up the theory of randomized communication. There are two common extensions beyond deterministic protocols:
randomized complexity:
The protocol tosses some coins to determine its actions. The communication complexity is fixed independent of the coins. For any input, the protocol gives a correct answer with probability 2/3 over the coin tosses. This probability can be amplified to any 1-ε, by running O(lg(1/ε)) independent repetitions in parallel, and taking the majority answer.

distributional complexity:
A distribution D is specified over the input pairs (x,y), where x is Alice's input, and y is Bob's. The protocol uses some fixed communication complexity, and it gives a correct answer with probability 1-ε over D. Note that amplification is generally impossible (two independent copies behave identically when run on the same input).
The case of randomized complexity further splits into two subcases:
public coins:
The coin tosses are observed by both Alice and Bob.

private coins:
Each player tosses his own coins privately. They may, of course, communicate the outcomes if they wish. The players trust each other fully, so privacy only makes life more difficult.
Cannonical example: Equality. Let Alice and Bob receive two n-bit vectors x,y ∈ {0,1}n. Their goal is to determine whether x=y. The complexity of this function by each measure is:
  • Deterministically, the total communication must be Ω(n), so the players cannot do anything nontrivial. Oddly, this does not follow by richness, as the lower bound is episode 1 (the equality function is only [2n,1]-rich). Nonetheless, it can be deduced elementary, as follows.

    We claim that any 1-rectangle cannot be nontrivial (i.e. have more than one cell). If the sides of the rectangle are {x1, x2, ...} and { y1, ...}, then x1 = y1, x2 = y1, but x1 and x2 were distinct. 

    On the other hand, there are 2n cells with a value of one (the diagonal of the matrix). So there must be Ω(2n) different rectangles, which can only be obtained by communication Ω(n).

  • The public coin complexity is O(1). Choose a hash function h:{0,1}n -> [100] by public coins. Then Alice can communicate h(x), and Bob says yes iff h(x)=h(y). The error probability is 1/100 if h comes from a universal family.

  • The private coin complexity is Θ(lg n). For the upper bound, Alice chooses a hash function, and communicates both the hash function and h(x). The hash function takes O(lg n) bits to communicate (if you know stuff about universal hash functions, the optimal description size grows like loglog of the universe, which in this case is 2n).

    The lower bound is not important.

  • The distributional communication complexity is only defined if we specify some distributions. For instance, if D chooses x and y uniformly and independently, a protocol with zero communication works with probability 1 - 1/2n. Indeed, Pr[x=y] = 1/2n, so you may just always answer "no."
Some theory. You can spend some time investigating the relations between these complexity measures, but personally I find this to be dry theory. Here are a few basic factoids you should know.

Lemma 1. Public coin complexity ≤ Private coins complexity ≤ Public coins complexity + O(lg n), where n is the size of the inputs.

Proof: This result may seem surprising at first -- Alice and Bob may toss many public coins and use them in a nontrivial way. How can they simmulate this with private coins and just O(lg n) extra bits of communication?

In fact, however, the proof is pretty standard in the land of derandomization. The main idea is that you can sample from the space of coin tosses, and understand the behavior just on the sample. Say you sample 100n strings of random coin tosses. On some input (x,y), the average error probability on the sample is roughly equal to the average error over the true random coins, with probability 1 - 2-Ω(100 n) over the random sample. 

But there are only 22n possible inputs. Thus, there exists a sample which simultaneously works for all inputs (for a large enough value of 100). The protocol may restrict itself to the sample without loss of generality. Then Alice, can pick one of the sample strings by private coins, and communicate the choice with lg(100n) bits. 


This lemma shows that the gap that we saw in Equality is the maximum possible. The take-away message is that we almost never worry about private-coin complexity, since this restriction changes the complexity so little. When people say "randomized protocol" they almost always mean "public-coin randomized protocol."


Lemma 2. Say some problem has a randomized (public-coin) protocol with error probability ε. Then, for any distribution D, there exists a protocol with distributional error ε.

Proof. Let e(x,y)=1 iff the protocol makes an error on inputs x,y. The randomized protocol gives (∀) x,y: Ecoins[e(x,y)] ≤ε.

Since every term is bounded, so is the mean: E(x,y)~D[ Ecoins [e(x,y)] ] ≤ε.

But then we can switch the expectations: Ecoins [ E(x,y)~D [e(x,y)] ] ≤ε.

Finally, there exists a setting not exceeding the mean: (∃) coins: E(x,y)~D[e(x,y)] ≤ε. Having fixed the public coins, the protocol is deterministic.

Observe that the protocol depends on the distribution D, because the coins are fixed after seeing the distribution, and they're built into the protocol. 


So what does this mean in practice? Just like randomized algorithms are more natural for upper bounds, distributional analyses are more natural in lower bounds. A natural lower bound starts by proposing a distribution D on the inputs that makes the problem hard. Then, it proves that no protocol can work well on that distribution, implying a distributional lower bound. This is the strongest statement of the lower bound.

However, as an added bonus, the lower bound implies that no randomized protocol can solve the problem any better. If you think about it, this is obvious: any randomized algorithm can be made deterministic by picking its best set of coins (the coins minimizing error on the hard distribution).

In principle, it might be possible to show randomized lower bounds directly, without also showing distributional lower bounds. This would be somewhat odd, because it would mean you are showing a lower bound, but you cannot tell us how to choose hard inputs (then why is the problem hard?). Currently, all known randomized lower bounds also give distributional lower bounds.

The above lemma is sometimes called "the easy direction of Yao's lemma." I oppose the terminology, since it's sily to be giving credit for kindergarden observations. The real "Yao's Lemma" (or the hard direction thereof) is the following:

Lemma 3. Assume that for any D, there exists a protocol with distributional error ε. Then, there exists a randomized protocol with error ε.

This tells us that trying to find a hard distribution always makes sense. If there exists no distribution that makes the problem hard, it must be that it's actually easy (for randomized algorithms). This is an interesting philosophical statement ("you can always find a hard distribution"), but of course, it is never used in practice.

The proof is based on the von Neumann's minimax theorem (something you might learn about in economics and game theory). The proof is not important for communication lower bounds, though the minimax principle is something well worth understanding.

Sunday, March 22, 2009

Blogs, research, being social

Richard Lipton has a new blog. Noam Nisan has a new blog.


Does TCS have too many blogs for its own sake? No doubt about that. My Google Reader list of blogs has long ago been downgraded from STOC/FOCS status ("should be aware of") to SODA status ("occasionally you can see interesting things").

This partially explains why I haven't been writing much. What should I write on? Maybe I should describe results in my area. But I never learned anything about graph drawing from David Eppstein; for me graphs are means, not ends. 

I never learned any cryptography from Luca. The really basic problems in cryptography are basic complexity questions -- I already know them, and have no clue how to attack them. The cryptographic side of crypto... well, they can deal with it themselves. I'm only interested enough to hear about it when it's done. [NB: Luca's rare posts about Italian research are cool though -- I use them to argue that research in Romania sucks not because we are emerging from 50 years of comunism, but because we descend from Rome. How many researchers from those countries who never worked in the US can you name?]

I now understand why online courseware can never replace physical courses: in a physical course, the professor has to waste quite a few hours of his day preparing and teaching. For the student, this sacrifice should (in principle) mean that the stuff is important enough. What would be an equivalent online mechanism that could extract an equivalent sacrifice? Maybe I should pledge $10 to a charity for each person that reads one of my technical blog posts carefully.

On the other hand, maybe I should write non-technical blog posts trying to influence the community. But I never became convinced that I should include experiments1 in my papers. And neither did "the community:" if you ever read comments on Michael's somewhat puritan blog, you have undoubtedly noticed that he mostly gets comments from people who agree with him, and is usually ignored by theory die-hards.
1 If you really must know, I find that algorithm engineering is a craft in itself (which I have been lucky enough to practice in my non-TCS life).  Asking for experiments is like saying you are only interested in papers that completely close a problem.
The obvious issue is the one of sacrifice that I mention above. If you want to convince me of something, let's grab a beer and I'll listen. Proselytizing on the web is too cheap.

Well, then, maybe I should not care too much about content, and should write general observations about life, the universe, and everything (a la Lance, Scott). The main problem is that we already know the answer

For as long as I can remember, I have been trying to understand society through mathematics, computer science, economics, mechanism design, etc. If you have a țuică with me, you run a risk of hearing my theories of what the continuum hypothesis really means, what undecidability really means, why the world is the way it is, what the role of research is, how to change this or that aspect of society, and so on. But I do have enough drinks with my friends, and I am left with zero motivation to blog about such things. Online mental exercises are wasted on me.


Which brings me to my next point... The social aspect of research was already bad enough. I grew up going to computer contests, where you were either good enough to win, or not good enough (for instance, in my first year of high school, I missed qualifying for the IOI by 2 points out of 900, which I found fair enough, if somewhat random). Ideally, research positions should also be handed out by a 5-year contest for solving hard problems, not by who can draw the biggest target around his arrow.

The social aspects of the research world may inescapable. But on blogs, hype travels much faster, and waves grow into tsunamis. Noam talks about the impact of blogs on highlighting results. For all our talk about theoretical depth, our work is already too shallow, if you ask me. I'd hate it if research turned into a Facebook contest for popularity. (NB: This discussion is unrelated to Mark Braverman's recent result, which did get quite a few blog posts.)

Maybe it's time to think about some problems.

Friday, March 20, 2009

The LSD Proof

From my colleagues among Israeli faculty, I hear strange tales about dealing with people who want to be your PhD students. It seems so many students are enthusiastic about doing a PhD (with theory faculty!), that hazing them becomes a good idea. As the stories go, you should tell your candidate students "Go read this paper, and come back once you've understood it."

The trick, of course, lies in your choice of the paper. If you ever want to do this to somebody who dreams of working in communication complexity, you might try this lower bound for lopsided set disjointness. The proof is straightforward to me, but it is fairly long and diverse, throwing all the tricks in the book at this problem. If you want to work in communication complexity, it makes sense to first reach the level where the proof is straightforward to you.

The story of set disjointness stretches back to the dawn of communication complexity. An optimal deterministic lower bound is easy to prove (see my earlier post). From what I understand, in the late 80's, proving a randomized lower bound was a major problem. The issue is that you cannot use independent distributions for Alice's and Bob's inputs, since two random sets will intersect with overwhelming probability. 

A randomized lower bound was obtained in 1992 by [Kalyanasundaram, Schnitger] and by [Razborov]. The technique was explained in an information-theory view by [Bar-Yossef, Jayram, Kumar, Sivakumar], who also found new interesting applications.

The basic idea is to consider a distribution in which the two sets never intersect. Let the universe be split into n blocks. For each block, we will either choose Bob's element publicly, or choose Alice's element publicly. The other player gets a private random input that is different from the public choice, making the sets disjoint. Under this distribution, the blocks are independent. Thus, if the protocol communicates o(n) bits in total, I can find a block about which it communicates o(1) bits of information. I will now alter the distribution, making it independent on that block alone. The protocol is still the same, so it will have large error because it cannot spot the difference.

In STOC'95, [Miltersen, Nisan, Safra, Wigderson] introduced asymmetric communication complexity (in which Alice and Bob have inputs of significantly different sizes), and showed how asymmetric lower bounds imply interesting data structure lower bounds. They proved an optimal deterministic lower bounds for asymmetric set disjointness, and left the randomized case as an "interesting" open problem. I suspect, however, that they simply did not have the time or motivation to work out how the previous techniques applied in the asymmetric case.

In my FOCS'06 paper with Alex Andoni and Piotr Indyk, we showed the first exciting application of lopsided set disjointness (LSD) to data structures: it implied that data structures with O(1) query time cannot solve (1+ε)-approximate nearest neighbor with less than n^Ω(1/ε2) space. This application only required a weaker version of an LSD lower bound, which could be proved with independent inputs given to Alice and Bob.

The really exciting development was my FOCS'08 paper, which showed that many interesting lower bounds could be obtained by just the right reductions from LSD. At this stage, I needed the optimal LSD lower bound --- so I just had to sit down and claim that the proof would appear in the journal version.

Ridden by guilt (and reminders from Jeff, acting as editor of the special issue), I had to finally write down the proof. Once you really, really understand the old symmetric technique, and you're confortable with the standard tricks from communication lower bounds, this proof is not too exciting. I'm glad to be done with it.

Monday, March 16, 2009

Midterm

As you know, I am teaching Computability and Complexity at Berkeley. Here is the midterm I gave today. It may be fun for my younger readers to look through it. Leave comments.


Regarding problem 3: I believe nobody should get a CS degree without being able to reason about grammars, precedence, associativity etc. This is far more important that knowing pushdown automata, the pumping lemma or anything like that.  

Problems 4 and 7 are typical interview puzzles. Unfortunately, not too many people recognized problem 7 as a disguise for the streaming problem of finding a majority element. I should have asked for Socrates to use only O(n) debates.

Problem 8 is actually hard. Even if you have an intuitive idea why Inca Machines cannot recognize more than regular languages, you need to find a couple of tricks to make it a formal proof.