Wednesday, March 25, 2009

Communication Complexity (II): Randomized, Distributional

Prelude: A while back I posted an introduction to communication complexity, in which I showed how to obtain deterministic lower bounds by analyzing monochromatic rectangles. That post is also available in PDF if you want to review it.

I will continue with a series of posts giving a "practical introduction" to communication complexity. This will hopefully be a good reference to whoever wants to work in the area. I will ignore as much theory as possible (I find the theory of communication complexity to be quite dry compared to the applications), and take you straight to proving interesting bounds.

Episode 2. This episode will set up the theory of randomized communication. There are two common extensions beyond deterministic protocols:
randomized complexity:
The protocol tosses some coins to determine its actions. The communication complexity is fixed independent of the coins. For any input, the protocol gives a correct answer with probability 2/3 over the coin tosses. This probability can be amplified to any 1-ε, by running O(lg(1/ε)) independent repetitions in parallel, and taking the majority answer.

distributional complexity:
A distribution D is specified over the input pairs (x,y), where x is Alice's input, and y is Bob's. The protocol uses some fixed communication complexity, and it gives a correct answer with probability 1-ε over D. Note that amplification is generally impossible (two independent copies behave identically when run on the same input).
The case of randomized complexity further splits into two subcases:
public coins:
The coin tosses are observed by both Alice and Bob.

private coins:
Each player tosses his own coins privately. They may, of course, communicate the outcomes if they wish. The players trust each other fully, so privacy only makes life more difficult.
Cannonical example: Equality. Let Alice and Bob receive two n-bit vectors x,y ∈ {0,1}n. Their goal is to determine whether x=y. The complexity of this function by each measure is:
  • Deterministically, the total communication must be Ω(n), so the players cannot do anything nontrivial. Oddly, this does not follow by richness, as the lower bound is episode 1 (the equality function is only [2n,1]-rich). Nonetheless, it can be deduced elementary, as follows.

    We claim that any 1-rectangle cannot be nontrivial (i.e. have more than one cell). If the sides of the rectangle are {x1, x2, ...} and { y1, ...}, then x1 = y1, x2 = y1, but x1 and x2 were distinct. 

    On the other hand, there are 2n cells with a value of one (the diagonal of the matrix). So there must be Ω(2n) different rectangles, which can only be obtained by communication Ω(n).

  • The public coin complexity is O(1). Choose a hash function h:{0,1}n -> [100] by public coins. Then Alice can communicate h(x), and Bob says yes iff h(x)=h(y). The error probability is 1/100 if h comes from a universal family.

  • The private coin complexity is Θ(lg n). For the upper bound, Alice chooses a hash function, and communicates both the hash function and h(x). The hash function takes O(lg n) bits to communicate (if you know stuff about universal hash functions, the optimal description size grows like loglog of the universe, which in this case is 2n).

    The lower bound is not important.

  • The distributional communication complexity is only defined if we specify some distributions. For instance, if D chooses x and y uniformly and independently, a protocol with zero communication works with probability 1 - 1/2n. Indeed, Pr[x=y] = 1/2n, so you may just always answer "no."
Some theory. You can spend some time investigating the relations between these complexity measures, but personally I find this to be dry theory. Here are a few basic factoids you should know.

Lemma 1. Public coin complexity ≤ Private coins complexity ≤ Public coins complexity + O(lg n), where n is the size of the inputs.

Proof: This result may seem surprising at first -- Alice and Bob may toss many public coins and use them in a nontrivial way. How can they simmulate this with private coins and just O(lg n) extra bits of communication?

In fact, however, the proof is pretty standard in the land of derandomization. The main idea is that you can sample from the space of coin tosses, and understand the behavior just on the sample. Say you sample 100n strings of random coin tosses. On some input (x,y), the average error probability on the sample is roughly equal to the average error over the true random coins, with probability 1 - 2-Ω(100 n) over the random sample. 

But there are only 22n possible inputs. Thus, there exists a sample which simultaneously works for all inputs (for a large enough value of 100). The protocol may restrict itself to the sample without loss of generality. Then Alice, can pick one of the sample strings by private coins, and communicate the choice with lg(100n) bits. 

This lemma shows that the gap that we saw in Equality is the maximum possible. The take-away message is that we almost never worry about private-coin complexity, since this restriction changes the complexity so little. When people say "randomized protocol" they almost always mean "public-coin randomized protocol."

Lemma 2. Say some problem has a randomized (public-coin) protocol with error probability ε. Then, for any distribution D, there exists a protocol with distributional error ε.

Proof. Let e(x,y)=1 iff the protocol makes an error on inputs x,y. The randomized protocol gives (∀) x,y: Ecoins[e(x,y)] ≤ε.

Since every term is bounded, so is the mean: E(x,y)~D[ Ecoins [e(x,y)] ] ≤ε.

But then we can switch the expectations: Ecoins [ E(x,y)~D [e(x,y)] ] ≤ε.

Finally, there exists a setting not exceeding the mean: (∃) coins: E(x,y)~D[e(x,y)] ≤ε. Having fixed the public coins, the protocol is deterministic.

Observe that the protocol depends on the distribution D, because the coins are fixed after seeing the distribution, and they're built into the protocol. 

So what does this mean in practice? Just like randomized algorithms are more natural for upper bounds, distributional analyses are more natural in lower bounds. A natural lower bound starts by proposing a distribution D on the inputs that makes the problem hard. Then, it proves that no protocol can work well on that distribution, implying a distributional lower bound. This is the strongest statement of the lower bound.

However, as an added bonus, the lower bound implies that no randomized protocol can solve the problem any better. If you think about it, this is obvious: any randomized algorithm can be made deterministic by picking its best set of coins (the coins minimizing error on the hard distribution).

In principle, it might be possible to show randomized lower bounds directly, without also showing distributional lower bounds. This would be somewhat odd, because it would mean you are showing a lower bound, but you cannot tell us how to choose hard inputs (then why is the problem hard?). Currently, all known randomized lower bounds also give distributional lower bounds.

The above lemma is sometimes called "the easy direction of Yao's lemma." I oppose the terminology, since it's sily to be giving credit for kindergarden observations. The real "Yao's Lemma" (or the hard direction thereof) is the following:

Lemma 3. Assume that for any D, there exists a protocol with distributional error ε. Then, there exists a randomized protocol with error ε.

This tells us that trying to find a hard distribution always makes sense. If there exists no distribution that makes the problem hard, it must be that it's actually easy (for randomized algorithms). This is an interesting philosophical statement ("you can always find a hard distribution"), but of course, it is never used in practice.

The proof is based on the von Neumann's minimax theorem (something you might learn about in economics and game theory). The proof is not important for communication lower bounds, though the minimax principle is something well worth understanding.


Anonymous said...

Nice posts! There is a typo I guess. Last item "The deterministic " should be "The distributional". Isn't it?

Mihai said...

Fixed, thanks.

Jeremy said...

This series of posts is too good!

I wonder, how many episodes in this series are there in your mind, and when will the last episode be written? I hope it will not last too long. (This episode 2 is posted nearly 1 year after episode 1.)

By the way, a typo is in your pdf file summerizing episode 1: in the step 2 of the proof, you should define S = S_1 \cup S_2 \cup ..., not S = S_1 \cap S_2 \cup ...

Mihai said...

I envision about 15-20 posts in this series. Hopefully I will actually complete it :) I am trying to write at least one post a week from now on.