Saturday, May 2, 2009

The non-Gödel papers

As you probably know, I think theory is in a deep crisis: our obsession to turn into philosophy (prividing unappreciated service to disciplines we do not understand), stemming from our deep commitment to become as ignorant and irrelevant as possible inside the CS department; the influx of people who like to pretend we're Mathematics, and have no idea what computation means; the obsession with the "polynomial" versus "non-polynomial" and it's many consequences; etc etc etc. Since I am not inclined to write novels, I have never been able to put all these arguments inside a coherent text.


Fortunately, every once in a while, the ball is raised at the net and I can say something short that expresses my feelings. Richard Lipton talks about the Gödel prize, and some papers that could also have won it. His examples are all in complexity, which is reasonable since he's a complexity theorist. But looking at the list of winners on Wikipedia, what it lacks more is algorithms — you know, papers that try to understand how fast some problems can be solved (the thing non-theorists believe we're studying).

Here is my own "forgotten list," from which I am certain to have forgotten some papers. Please add your favorite examples in the comments — but note that I am deliberately discarding old papers that were inelligible, like splay trees, classic flow algorithms, persistence, etc. 

In no particular order:
  • [Fredman-Willard] on the fusion tree. This kick-started a major field (integer search) and directly led to at least two dozen follow-up papers in STOC-FOCS-SODA. Candidates for the Gödel Prize do not get any better.

    This paper has passed its eligibility period, unfortunately. Another good candidate in this line of work would be [Thorup] solving undirected shortest paths in O(n) time.

  • Karger's papers on mincut, which are very influential on the use of randomization in graph algorithms.

  • Clarkson's introduction of randomization to geometry (thanks, Suresh!).

  • Chazelle's series on "Lower Bounds for Orthogonal Range Searching" is a remarkable contribution. However, Chazelle should probably just win the Knuth Prize since his contributions are too varied (MST, fractional cascading, triangulation, etc).

  • the power of two choices is an influential and popular idea. I'm not sure who should get the award though; as with PCPs, it can perhaps be shared by a couple of papers.

  • dynamic graph algorithms are a broad, deep, and popular research topic. [Henzinger-King] and [Holm-de Lichtenberg-Thorup] could win a joint award for polylogarithmic connectivity. These papers basically restarted the whole area and led to maybe 2 dozen publications in STOC-FOCS-SODA. The latter is also a popular data structure to teach, for its basic beauty.

    Another candidate pair for a joint award would be [Demetrescu-Italiano] and [Sankowski], which give remarkable solutions to all-pairs shortest paths and transitive closure.

Add your own favorite examples in the comments.

So, am I claiming an anti-algorithmic bias because these papers did not win? No, all sorts of randomness is involved in prizes. I am claiming an anti-algoritmic bias because, in the current environment, it seems such papers are not even viable candidates.

Sunday, April 19, 2009

Christos a înviat!

Happy Easter!  Christos a înviat!

Tuesday, April 14, 2009

Theory and Practice

Via David Eppstein, I find out about the Workshop on Theory and Multicores. A memorable citation from the call:

[...] Indeed, the programs of many mainstream computer science conferences, such as ASPLOS, DAC, ISCA, PLDI and POPL are heavily populated with papers on parallel computing and in particular on many-core computing. In contrast, the recent programs of flagship theory conferences, such as FOCS, SODA and STOC, hardly have any such paper. This low level of activity should be a concern to the theory community, for it is not clear, for example, what validity the theory of algorithms will have if the main model of computation supported by the vendors is allowed to evolve away from any studied by the theory. The low level of current activity in the theory community is not compatible with past involvement of theorists in parallel computing, and to their representation in the technical discourse.
Which theory community? The one that thinks our machine models are not robust enough to tell the difference between O(n lg n) and O(n5)? The one that thinks nO(1/ε^2) is polynomial and even efficient? The one that thinks Amazon should be happy with an O(lg3n) approximation for its profit? The one that thinks the coolest conclusions we ever came to are the philosophical implications of interactive proofs and zero knowledge?

The theory community will do just fine, thank you very much.

As for "the low level of current activity" being incompatible "with past involvement of theorists in parallel computing" --- it is exactly the past involvement that led to the current attitude towards parallel computing! Parallel computing is the fabled field where we got burned badly, and the first example people use to argue that theory is still very young. (It usually goes like this: "Imagine an equivalent of Perelman disappearing for 20 years and coming back with an answer to the most burning question about PRAMs from 1985. Who would care now?").

It's going to be hard to regain enthusiasm about parallel computing, as timely as the moment may be.

But at least we learned our lessons. Never again will a hyped-up mob of theorists rush head-on to study a machine model too early. Never will we study a machine model before such computers are even built, and while the practicality of such computers is still debated. We understand that producing a theoretical model too early will have us chasing the wrong rabbits, and that our results might be as relevant as the 5-state 17-symbol universal Turing Machine was in the design of the Pentium.


Alright, enough of this. I should go back to reading about 3-round quantum interactive proofs with entanglement.

Wednesday, April 8, 2009

Puzzle: Short-hop permutations

Count the number of permutations of size n, such that |π(i) - π(i+1)| ≤ k, for all i. That is, the hop between any two consecutive values must be short.


I am asking for an algorithm, not a closed formula. (Combinatorialists: is one known?)

Your algorithm should work fast for values like n=106 and k=8. I have not thought hard about supporting larger k (say, k = 20), but I would certainly be interested to hear such a solution if you have one.

Credits. This problem was used in a selection contest for the Romanian olympic team. The original author was Marius Andrei. My subjective judgment of its difficulty is "average+" (it was tricky, but I solved it within 10 minutes). YMMV.

CC4: One-Way Communication and a Puzzle

As we will discuss in the next post, for some application it suffices to get lower bounds on the one-way communication complexity -- that is, the case in which a single message is sent, either from Alice to Bob, or from Bob to Alice. The party receiving the message must immediately announce to answer, based on the message and their own input.


Referring to our INDEXING lower bound in CC3, we can immediately obtain a one-way lower bound by setting a=0 (one message Bob→Alice) or b=0 (one message Alice→Bob). We conclude that Alice must send Ω(n) bits in the A→B model, and that Bob must send Ω(lg n) in the B→A model. The former case is much more interesting in applications, so when people say "the one way communication complexity of INDEXING," they always mean the Alice→Bob model.

One-way communication is often simple to analyze, and we can obtain lower bounds by direct appeal to information theory. Here is a short and simple proof of the INDEXING lower bound:

Lemma. The one-way communication complexity of INDEXING is Ω(n).

Proof. Assume there exists a protocol in which Alice sends n/10 bits, which works with probability (say) 0.99 over the uniform distribution. We use this to construct an encoder for a random vector of n bits. We show that the average length of the encoding is strictly less than n, which is a contradiction because the entropy of the random bit string is n.

Say we want to encode A[1..n]. We give Alice the input A, and include her message M in the encoding. Then, we specify the set of indices i, such that, if Bob has input i and receives message M from Alice, he will give an incorrect answer to INDEXING. This is the entire encoding.

Claim 1: we can decode A from the encoding. To accomplish this, simmulate the action of Bob for every possible i. For every index in the bad set, A[i] is the opposite of what Bob says.

Claim 2: the average size of the encoding is at most 0.85 n. The first component (Alice's message) is n/10 in size by assumption. The second component has size lg(n choose k), where k is the number of bad indices for a particular A. How do we analyze the expected size of this component?
  • EA[k] = 0.01 n. Indeed, for uniformly random A and i, the algorithm works with probability 0.99. So in expectation over A, the bad set of indices has size 0.01 n.
  • By the Markov bound, PrA[≤ 0.02 n] ≥ 1/2. 
  • When ≤ 0.02 n,  lg(n choose k) ≤ n/2.
  • When ≥ 0.02 n,  lg(n choose k) ≤ n.
  • Therefore E[lg(n choose k)] ≤ 0.75 n.
Therefore, the total size of the encoding is 0.85 n in expectation, contradiction.

A puzzle. Since this blog has featured algorithmic puzzles recently, it's time for a lower bound puzzle:
Say Alice and Bob receive two set of n elements from the universe [n2]. Show that the one-way randomized communication complexity of set disjointness is Ω(n lg n) bits.
This is the best result possible, since Alice can specify her entire set with O(n lg n) bits, after which Bob knows the answer with no error. 

In fact, the O(n lg n) complexity holds even for large universes: use a universal hash function from the universe to a range of 100 n2. The hash functions introduces zero collisions with probability 99%, so it doesn't introduce false positives in the intersection.

Acknowledgements. Thanks to Mert Sağlam for telling me about this question. As far as I know, the proof is folklore. David Woodruff wrote an email describing this result many years ago.


Monday, April 6, 2009

Puzzle: Cycle Containing Two Nodes

The following simple puzzle was circulating among Romanian olympiad participants around 1998. It was supposed to be a quick way to tell apart algorithmists from mere programmers.

Given an undirected graph G, and two vertices u and v, find a cycle containing both u and v, or report than none exists. Running time O(m).
Update: Simple ideas based on a DFS (like: find one path, find another) do not work. Think of the following graph:
If you first find the path s → a → b → t, you will not find a second. 

The one-line answer is: try to route two units on flow from s to t in the unit-capacity graph (with unit node capacities if you want simple cycles). This is not the same as two DFS searches, because the second DFS is in the residual graph (it can go back on the edges of the first DFS).


About the previous puzzle: As many people noticed, Alice can guarantee a win or a draw. She computes the sum of the elements on odd positions and the sum on the even positions. Depending on which is higher, she only plays odd positions or only even positions. (Bob has no choice, since the subarrays he's left with always have the ends of the same parity.)

But how do you compute the optimal value for Alice? If the sum of even and odd is equal, how can Alice determine whether she can win, or only draw? A simple dynamic program running in O(n2) time works. Can you solve it faster?

Friday, April 3, 2009

CC3: the good, the bad, the ugly

This is the 3rd post in the thread on communication complexity, following Episodes 1 and 2.


This post is also available in PDF.

In today's episode, we will prove our first randomized/distributional lower bound. We will consider a problem that appears quite trivial at first sight -- INDEXING, defined as follows:
  • Alice receives a bit vector,  x ∈ {0,1}n
  • Bob receives an index,  y ∈ [n]        (note standard notation: [n]={1,...,n})
  • their goal is to output xy, i.e. the y-th bit of the vector x.
One way to view INDEXING is as the simplest case of lopsided set disjointness, where Bob's set is of size one: Alice receives the set S = { i | x=1 }, and Bob receives T = { y }.

Intuition. What trade-off should we expect between Alice's communication a, and Bob's communication b? Clearly, the problem can be solved by [a=1, b=lg n] and by [a=n, b=1]. 

In between these two extremes, the best use of b seems to be for Bob to send b bits of his index. Alice now knows the index to be in a set of n/2b possibilities. She can simply send all her bits at these positions, using communication an/2b. Finally, Bob announces the answer with one more bit.

We thus showed the upper bound: an/2b-1. It should be fairly intutive that this is also a tight lower bound (up to constants). Indeed, no matter what Bob communicates, his index will be uncertain in a set of n/2b possibilities (on average). If Alice sends less than n/2b+1 bits of information, at least half of the possible index positions will not have a specified answer based on Alice's message. In other words, the protocol fails to determine the answer with constant probability (i.e. makes constant error).


Distributions. Before proving a distributional lower bound, we must find the distribution that makes the problem hard. From the intuition above, it should be clear that the right distributions are uniform, both for Alice's vector and for Bob's index.


Rectangles. We are in the situation of product distributions: the inputs of Alice and Bob are independent. This is a very good situation to be in, if you remember the main take-home lesson from Episode 1: rectangles. Remember that some fixed communication transcript is realized in a combinatorial rectangle A x B, where A is a subset of Alice's possible inputs, and B a subset of Bob's possible inputs. The main ideas of a deterministic proof were:
  • little communication from Alice means A is large;
  • little communication from Bob means that B is large;
  • but the rectangle must be monochromatic (the answers must be fixed, since the communication is over)
  • if A and B are large, you must have both yes-instances and no-instances.
For a product distribution, we will perform essentially the same analysis, given that the densities μ(A) and μ(B) can be measured independently. The only difference will be that the rectangle is not monochromatic, but almost monochromatic. Indeed, if the protocol may make some errors, the answer need not be fixed in the rectangle. But it must happen that one answer (yes or no) is much more frequent -- otherwise the error is large.

The first three steps of the analysis are formalized in the following randomized richness lemma, from the classic paper of [Miltersen, Nissan, Safra, Wigderson STOC'95]:

Lemma. (randomized richness)  Say Alice and Bob compute a function : XxY -> {0,1} on a product distribution over XxY. Assuming that:
  • f is dense, in the sense that E[ f(x,y) ] ≥ α ≥ 1/5.
  • the distributional error is at most ε.
  • Alice communicates a bits, and Bob b bits.
Then, there exists a rectangle AxB (AX, BY) satisfying:
  • Alice's side is large: μ(A) ≥ 2-O(a);
  • Bob's side is large: μ(B) ≥ 2-O(a+b);
  • the rectangle is almost monochromatic: E [ f(x,y) | xA, yB ] ≥ 1- O(ε).
Proof: Though the statement is very intuitive, the proof is actually non-obvious. The obvious approach, which fails, would be some induction on the bits of communication: fix more bits of the communication, making sure the rectangle doesn't decrease too much, and the error doesn't increase too much in the remaining rectangle.

The elegant idea is to use the deterministic richness lemma. Let F be the output of the protocol (what Alice and Bob answer). We know that f and F coincide on 1-ε of the inputs. By definition, the protocol computes F deterministically with no error (duh!). It is also clear that F is rich, because it is dense E[F] ≥ α-ε.

So apply the deterministic richness lemma from Episode 1. We get a large rectangle of F in which the answer is one. But how do we know that f is mostly one in the rectangle? It is true that F and f differ on only ε of the inputs, but that ε might include this entire rectangle that we chose! (Note: the rectangle size is ~ 2-a x 2-b, so much much smaller than some constant ε. It could all be filled with errors.)

We were too quick to apply richness on F. Now define G, a cleaned-up version of F. Consider some transcript of the protocol, leading to a rectangle AxB. If the answer is zero, let G=0 on AxB. If the answer is one, but the error inside this rectangle is more than 10 ε, also let G=0. Otherwise, let G=1 on the rectangle.

How much of the truth table gets zeroed out because of excessive error (above 10 ε)? Well, the overall average error is ε, so we can apply a Markov bound to it: the mass of all rectangles in which the error exceeds 10 ε is at most 1/10.

Thus, G is also fairly dense: E[G≥ E[F] - 1/10 ≥  α - (1/10) ε ≥ 1/10 - ε. Thus, G is rich, and we can find a bit rectangle in which it is identically one. But in that rectangle, E[f] ≥ 1 - 10 ε by construction of G.


The good, the bad, the ugly. It remains to prove that in any large rectangle, the fraction of zeros must be non-negligible (it may not be almost all ones). This part is, of course, problem specific, and we shall do it here for INDEXING. 

Unfortunately, these proofs are somewhat technical. They typically apply a number of "pruning" steps on the rectangle. An example of pruning on the space of rectangles was seen above: we zeroed out all rectangles that had more than 10 ε error. In these proofs, we throw out rows and columns we don't like for various reasons. One usually makes many funny definitions, talking about "good rows", "bad rows", "ugly columns", etc.

While looking at such proofs, it is important to remember the information-theoretical intuition behind them. After you understand why the statement is true (handwaving about the information of this and the entropy of that), you can deal with these technicalities on the night before the STOC/FOCS deadline.

Here is how the proof proceeds for INDEXING:

Lemma: Consider any A ⊂ {0,1}n and B ⊂ [n] such that |A| ≥ 2n+1 / 2|B|/2. Then EA,[f(x,y)] ≥ 0.95.

Proof: Assume for contradiction that PrA,[f(x,y) = 0] ≤ 1/20. Let an ugly row be a row xA such that Pr[f(x,y) = 0] > 1/10. At most half of the rows are ugly by the Markov bound. Discard all ugly rows, obtaining A'⊂A, with |A'| ≥ |A|/2.

For every remaining xA', we have xy=0 for at least 0.9 |B| indices from B  (this is the definition of f). Call these good indices. 

There are (|B|  choose  0.9|B|) = (|B|  choose  0.1|B|) choices for the good indices. For every set of good indices, there are at most 2n - 0.9|B| vectors x which are zero on the good indices. Thus:
|A'|  ≤  (|B|  choose  0.1|B|) 2- 0.9|B|  ≤  2n 2O(0.1 |B| lg 10) / 20.9|B = 2n / 20.9|B|-O(0.1 |B| lg 10)
This is at most 2n / 20.5|B| , at least for a sufficiently small value of 1/10 (I never care to remember the constants in the binomial inequalities). We have reached a contradiction with the lemma's guarantee that A is large.

Combine this with the richness lemma with an error ε=0.05. We have |A|≈2n-a and |B|≈n/2b, so it must be the case that a = Ω(|B|) ≥ n/2O(b). We have obtained a tight lower bound.