## Friday, May 23, 2008

### Being Rich (I)

[Updated May 26: writing improved based on a "referee report" by an anonymous commenter. Let me also point out that use of paper and pencil while reading this text is encouraged.]

Let switch gears from politics to science... I am going to run a short series of posts talking about asymmetric communication complexity and how it relates to data structures. In this post, I start by defining the problems and the model, and give a simple (but very interesting) example of a lower bound.

No background is assumed. I believe a high school student should be able to follow this as a gentle introduction to lower bounds. (Yes, yes, I mean a high school student educated in Easter Europe, who knows the European "basic Math." For example, we use (1 + 1/n)n --> e at some point.)

If there are steps that you want clarified, you are welcome to post a question.

Set disjointness. Today, we start with a simple, abstract problem: set disjointness. Let's say Alice has a set S, Bob has a set T, and they are trying to communicate somehow, to determine whether S and T intersect. To quantify the problem, say |S|=n, |T|=m, and both sets come from a universe [u]={1,...,u}.

For now, we will only be interested in deterministic communication protocols (as opposed to protocols using randomization). In "symmetric" communication complexity, we are primarily interested in the dense case, n=m=Θ(u), and we ask how many bits of communication Alice and Bob must exchange to solve the problem correctly.

The trivial upper bound is u+1: Alice sends a bit vector of size u, specifying her set, and Bob replies with the answer. Alternatively, Bob can send his bit vector, and Alice computes the answer. It can be shown easily (see below) that any protocol must use Ω(u) bits of communication in the worst case, so the trivial protocol is asymptotically optimal.

In the asymmetric case, assume n<<m, and m=Θ(u). Because the input sizes are rather different, it is more interesting to measure the number of bits sent by Alice and Bob separately, instead of measuring the total communication. Denote by a the number of bits sent by Alice throughout the protocol, and b the number of bits sent by Bob.

Clearly, there is a trade-off between a and b. The trivial protocols from above give two points on the trade-off curve. Alice can describe her set using a = lg(u choose n) = O(n lg(u/n)) bits, and Bob replies with b=1 bit. Alternatively, Bob can describe his set using b=u bits, and Alice replies with a=1 bit.

Pause for a second, and try to come up with the best trade-off curve interpolating between these two extreme points.

x
x
x
x
x
This space intentionally left blank.
x
x
x
x
x

Here is how the trade-off looks like:
Values of a = o(n) are ineffective, i.e. Bob cannot save asymptotically over sending u bits. Similarly for values of b=o(n). The interesting portion of the trade-off is the curved one, which is described by: b = u / 2O(a/n).

Here is how we can achieve this trade-off. First, let k≥a be a parameter. We break the universe [u] into k blocks of roughly equal size, u/k. Now:

• Alice begins by specifying the blocks containing her elements. This takes a=lg(k choose n) = O(n lg(k/n)) bits (if there are fewer than n distinct blocks, pad arbitrarily).
• for every block containing an element of Alice, Bob sends a bit vector of size u/k, specifying which elements are in T. This takes b = nu/k bits.
• Alice replies with one more bit giving the anwer.
We have a=O(n lg(k/n)) and b=nu/k. Now eliminate the parameter: k=n*2O(a/n) and thus b = u / 2O(a/n).

Lower bounds. We are now going to prove that this trade-off is the best possible for determinstic protocols. This proof originates from [Miltersen, Nisan, Safra, Wigderson STOC'95], the paper that defined asymmetric communication complexity.
Without disparaging this great paper, I must say that the presentation is awful. The authors fail to realize the vast conceptual contributions that their asymmetric communication model can make. Think of the implications to understanding social interactions between humans (potentially, we may finally understand communication inside a marriage!). Instead, the authors chose to concentrate on boring technical aspects, like showing that their concept can prove lower bounds for computational problems.
To prove lower bounds for set disjointness, we are going to look carefully at the truth table of the function. This is a gigantic object, a 2D array with (u choose n) rows (corresponding to Alice's possible inputs), and (u choose m) columns (corresponding to Bob's inputs). Every entry is 1 if the set described by the column is disjoint from the set described by the row, and 0 otherwise.

The starting point of most communication lower bounds is the concept of (combinatorial) rectangles. A rectangle is a matrix minor of the truth table, i.e. a set AxB, where A is a subset of the rows and B is a subset of the columns. [Warning: this does not look like a rectangle in the geometric sense, since the rows in A and columns in B may not be consecutive.]

Suppose you are an outside observer, who doesn't know the inputs of Alice and Bob, but watches the communication taking places and tries to guess the inputs. After seeing some transcript of the communication, what have you learned about the input? It turns out, the possible problem instances that lead to a fixed communication trascript are always a combinatorial rectangle!

This can be seen by induction on the communication protocol. Before any communication, any inputs look plausible to the outside observer, i.e. the plausible inputs are the the entire truth table matrix. Say that in the next step, Alice sends a bit. This breaks the plausible inputs of Alice in 2 disjoint classes: the inputs for which she would send a 1, and the inputs for which she would send a 0. Observing the bit she sent, your belief about the input changes to a subset for Alice, and does not change in any way for Bob. Thus, your belief changes to a subrectangle, that drops some of the rows of the old rectangle. By analogous reasoning, when Bob sends a bit, your belief changes by dropping some columns.

Communication lower bounds follow by proving two facts:
1. If the communication protocol is short, the rectangle reached at the end is "large". (Intuitively, there weren't many steps of the induction to shrink it.) But in the final rectangle, all entries must be identical, either zero or one, because the protocol finishes by announcing the answer to the problem. In other words, if the protocol finished by announcing an answer, and there's still a plausible input for which the answer is different, the protocol is sometimes wrong.

2. Combinatorially, one shows that any large rectangle is bichromatic (contains both zeros and ones).
We are now going to show lower bounds for set disjointness, implementing these 2 steps.

1: Richness and large rectangles. How do we prove a useful result saying that the final rectangle must be "large"? For this, [MNSW] introduce the notion of "richness" (I don't know the story behind the name). We say (the truth table of) a function is [U,V]-rich if it contains at least V columns, each of which has at least U one-entries.

Lemma: If Alice and Bob compute a [U,V]-rich function by communicating a total of a, respectively b bits, the function contains a rectangle of size (U/2a) x (V/2a+b) with only one entries.

Proof: By induction on the length of the protocol. Let's say that we are currently in a rectangle AxB that is [U,V]-rich. We have two cases:
• Bob communicates the next bit. Let's say B0 is the set of columns for which he sends zero, and B1 is the set for which he sends one. Since AxB contains V columns with at least U ones, either AxB0 or AxB1 contain V/2 columns with at least U ones. We continue the induction in the [U,V/2]-rich rectangle.

• Alice communicates the next bit, breaking A into A0 and A1. Each of the V columns that made AxB rich has at least U/2 ones in either A0xB or A1xB. Thus, in either A0xB or A1xB we have find at least V/2 columns that have at least U/2 ones. We can continue the induction in a rectangle that is [U/2, V/2]-rich.
At the end of the protocol, we reach a monochromatic rectangle that is [U/2a, V/2a+b]-rich. Since the rectangle has nonzero richness, it contains some ones, and therefore it contains only ones. Furthermore, it must have size at least U/2a by V/2a+b to accomodate the richness.

2: Rectangles in set disjointness. Remember that we are interested in the case of a dense T; let's assume m=u/2 for concreteness. A column of the truth table is a set T of m=u/2 elements from the universe [u]. For each T, there are (u/2 choose n) sets S which are disjoint from T. Thus, the problem is [(u/2 choose n), (u choose u/2)]-rich. By the richness lemma, we obtain a one-rectangle of size (u/2 choose n) / 2a by (u choose u/2) /2a+b.

Assume we have a rectangle {S1, S2, ...} x { T1, T2, ...} that is all ones. Then, evey Si is disjoint from every Tj. Defining S'= S1 ∪ S2 ∪ ... and T'= T1 ∪ T2 ∪ ..., it must be that S' is disjoint from T', therefore
(1) |S'| + |T'| ≤ u.
But also, the size of the rectangle is at most (|S'| choose n) by (|T'| choose u/2), because every row is an n-subset of S', and every column an m-subset of T'. Therefore:
(2) (|S'| choose n) ≥ (u/2 choose n) / 2a
(3) (|T'| choose u/2) ≥ (u choose u/2) / 2a+b
We now pull the following binomial inequality out of our hat:
(A choose C) / (B choose C) ≥ (A/B)C
Then (2) becomes (u/(2|S'|))n < 2a, and thus |S'| > u / 2O(a/n). From (1) we have |T'| ≤ u - |S'| ≤ u (1 - 2-O(a/n)). Applying the binomial inequality to (3), we have:
(u / |T'|)u/2 ≤ 2a+b => (1+2-O(a/n))Θ(u) ≤ 2a+b => eu/2O(a/n) ≤ 2a+b => b ≥ u / 2O(a/n), QED.
The first implication used 1 / (1 - ε) > 1 + ε. The second implication used (1 + 1/A)B = ((1 + 1/A)A)B/A = eΘ(B/A).

Anonymous said...

Quite a good introduction to communication complexity. However, I do not think a highschool student would be able to digest all the material (well, maybe if (s)he spends several hours on it), because of the "mathematical maturity" needed.

I do have some suggestions:
- "... is [U,V]-rich if at least V columns have at least U one-entries" would be way clearer stated as "it contains at least V columns, each of which has at least U one-entries.".

- "a column of the truth table is a set T of, say, m=u/2 elements from the universe [u]" -- remind the reader m = \Theta(u) and thus you can have m \approx (u/k) for some constant k (and you suppose k=2 for brevity). Also, I think you should say "subset", not "set", because a column may represent a set of less than m elements.

- explicitly say choose has a lower priority of /, thus u/2 choose k is (u/2) choose k, not u/(2 choose k).

- when you say "Then (2) becomes (u/(2|S'|))^n < 2a", you mean "2^a" there.

- explain why the maximal size is (S' choose n) by (T' choose m=u/2).

- technically (A choose C)/(B choose C) >= (A/B)^C :D, not > (ok, ok, don't get too mad over this nitpicking)

- you should explain the last two implications, for those for whom calculus is not their mother tongue ;-)

That said, I think the presentation is very good and readable, even though to get through the last part I had to pull out the computer scientist's best tools: pen and paper.

Suresh said...

Why is asymmetric CC interesting, apart from providing a higher-powered lens for the "asymmetric" case of Alice and Bob communication. For example, if we take streaming problems, where CC lower bounds are often used, is there any example of a problem that uses asymmetric CC in a nontrivial way ?

Anonymous said...

Suresh, asymmetry has two components, the order in which the players speak and the number of bits each can speak. The former is used in streaming all over the place. However most reductions in streaming use reduction where all players have the same memory footprint, or otherwise speak the same number of bits. So you probably asked for the latter.

The latter (the restriction of bits spoken by each player) is best suited when the sides are unequal - say when we are ineracting with a data structure (as was the historical and more recent reason for the study of richness).
Other than data strucutures, even in streaming, maybe there is a result possible that "to solve problen X, and guaranteeing per input processing time y, we would need Z bits of space whereas X has a better algorithm which uses less than Z bits but
may take more time than y on a few inputs (and takes less amortized time, to make things interesting)". This really would be a
lower bound for updateable data strucutres, but at some point the distinctions are blurry.
sudipto.

MiP said...

Anon, thank you for the suggestions. I implemented them in the text (with a few exceptions).

I maintain that an Easter European high school student can follow this (with pencil and paper, sure). I'll ask some people in this class to see what they think :)

MiP said...

Suresh,

As Sudipto also said, the big application of asymmetric communication complexity is data structures, not streaming. I will explain in the next post why symmetric communication is useless for data structures.

[MNSW] also had some applications to "depth hierarchies for monotone AC0." But I think monotone AC0 is too "pathological" for most of us.

дед мороз said...

Awesomeness! I have to admit that before reading this MNSW 95 seemed to me a little hard to crack.
Thanks for the intuition your post is offering!

Cafer said...

"Without disparaging this great paper, I must say that the presentation is awful. ... (potentially, we may finally understand communication inside a marriage!). ... "

Mihai, please tell me that the above text wasn't a referee comment that Miltersen received.