Wednesday, April 8, 2009

CC4: One-Way Communication and a Puzzle

As we will discuss in the next post, for some application it suffices to get lower bounds on the one-way communication complexity -- that is, the case in which a single message is sent, either from Alice to Bob, or from Bob to Alice. The party receiving the message must immediately announce to answer, based on the message and their own input.


Referring to our INDEXING lower bound in CC3, we can immediately obtain a one-way lower bound by setting a=0 (one message Bob→Alice) or b=0 (one message Alice→Bob). We conclude that Alice must send Ω(n) bits in the A→B model, and that Bob must send Ω(lg n) in the B→A model. The former case is much more interesting in applications, so when people say "the one way communication complexity of INDEXING," they always mean the Alice→Bob model.

One-way communication is often simple to analyze, and we can obtain lower bounds by direct appeal to information theory. Here is a short and simple proof of the INDEXING lower bound:

Lemma. The one-way communication complexity of INDEXING is Ω(n).

Proof. Assume there exists a protocol in which Alice sends n/10 bits, which works with probability (say) 0.99 over the uniform distribution. We use this to construct an encoder for a random vector of n bits. We show that the average length of the encoding is strictly less than n, which is a contradiction because the entropy of the random bit string is n.

Say we want to encode A[1..n]. We give Alice the input A, and include her message M in the encoding. Then, we specify the set of indices i, such that, if Bob has input i and receives message M from Alice, he will give an incorrect answer to INDEXING. This is the entire encoding.

Claim 1: we can decode A from the encoding. To accomplish this, simmulate the action of Bob for every possible i. For every index in the bad set, A[i] is the opposite of what Bob says.

Claim 2: the average size of the encoding is at most 0.85 n. The first component (Alice's message) is n/10 in size by assumption. The second component has size lg(n choose k), where k is the number of bad indices for a particular A. How do we analyze the expected size of this component?
  • EA[k] = 0.01 n. Indeed, for uniformly random A and i, the algorithm works with probability 0.99. So in expectation over A, the bad set of indices has size 0.01 n.
  • By the Markov bound, PrA[≤ 0.02 n] ≥ 1/2. 
  • When ≤ 0.02 n,  lg(n choose k) ≤ n/2.
  • When ≥ 0.02 n,  lg(n choose k) ≤ n.
  • Therefore E[lg(n choose k)] ≤ 0.75 n.
Therefore, the total size of the encoding is 0.85 n in expectation, contradiction.

A puzzle. Since this blog has featured algorithmic puzzles recently, it's time for a lower bound puzzle:
Say Alice and Bob receive two set of n elements from the universe [n2]. Show that the one-way randomized communication complexity of set disjointness is Ω(n lg n) bits.
This is the best result possible, since Alice can specify her entire set with O(n lg n) bits, after which Bob knows the answer with no error. 

In fact, the O(n lg n) complexity holds even for large universes: use a universal hash function from the universe to a range of 100 n2. The hash functions introduces zero collisions with probability 99%, so it doesn't introduce false positives in the intersection.

Acknowledgements. Thanks to Mert Sağlam for telling me about this question. As far as I know, the proof is folklore. David Woodruff wrote an email describing this result many years ago.


5 comments:

Anonymous said...

King of lower bounds says...

SPOILER: Map Alice's (n log n)-bit string x to the set {a[1],...a[n]} where the i-th block of log n bits of x encodes a[i]. Map Bob's index to an appropriate (n/2)-sized subset of [(j-1)n+1 .. jn].

COUNTER-PUZZLE: Is n^2 the smallest universe size for which this result is possible?

Anonymous said...

Oops, first set above should be {a[1],n+a[2],...(n-1)+a[n]}.

Anonymous said...

Aargh, n(n-1)+a[n].

There! Lots of comments to your post.

Jeremy said...

Thanks, Mihai. I learn another lesson. Looking forward to the next ...

Anonymous said...

Here is another proof that exhibits the distribution explicitly and resorts to standard direct sum tools.

We define a product distribution µ = µ_A x µ_B such that when (x,y) ~ µ, Pr[x,y disjoint] = 1/2.

Partition the universe [2n^2] in n blocks of size 2n. For Alice's input select randomly single element from each block (independently). Bob's input distribution is sum of n distributions: µ_B = 1/n Σ D_i, where D_i is defined as follows: Randomly select a subset of size n of block i. Don't chose any elements from other blocks.

Denote Alice's and Bob's inputs X=X_1...X_n and Y=Y_1...Y_n. Suppose there is an eps error protocol in which Alice's sends the message A. We have I(X : A) <= |A|. By averaging for at least half of the blocks I(X_i : A) <= 2|A|/n. Among this half there exist an i such that error on µ x D_i is smaller than 2eps. Therefore the information cost of the problem is n/4 times the information cost of a single membership problem (with universe [2n])
The lemma follows from IC_uniform,2eps(MEM_n) = Omega(log n).

To Answer King of lower bounds's question: The lemma holds for any universe n^{1+eps}, eps>0.

Does anyone know a distribution which is hard for both one way and arbitrary round protocols. (requires n log n bits for one way, n bits for arbitrary)?