The STOC accepted papers, with some pointers to online versions, can be found here. I had two papers: Towards Polynomial Lower Bounds for Dynamic Problems and Changing Base without Losing Space. More on them later.

## Friday, February 12, 2010

### The 6 Months News Cycle

Posted by Mihai at 2:10 PM 2 comments Links to this post

## Tuesday, February 2, 2010

### Cuckoo Hashing

Today, we will prove bounds for the basic cuckoo hashing. We are going to place our *n* keys into two arrays A[1..b] and B[1..b], where b=2n. For this, we will use two hash functions, *h* and *g*. When some element *x* arrives, we first try to put it in A[h(x)]. If that location already contains some element *y*, try to move *y* to B[g(y)]. If that location already contains some *z*, try to move *z* to A[h(z)], and so on until you find a free spot.

The proper image to have in mind is that of a random bipartite graph. The graph will have *b* nodes on each side, corresponding to the locations of A and B. In this view, a key *x* is an edge from the left vertex h(x) to the right vertex g(x).**Simple paths.** As a warm-up, let's deal with the case of simple paths: upon inserting *x*, the update path finds an empty spot without intersecting itself. It turns out that the update time of cuckoo hashing behaves like a geometric random variable:

The probability that insert(I will prove this by a cute encoding analysis (you know I like encoding proofs). Let's say you want to encode the two hash codes for each of thex) traverses a simple path of lengthkis 2^{-Ω(k)}.

*n*keys. As the hash functions

*h*and

*g*are truly random, this requires H=2

*n*lg

*b*bits on average (the entropy). But what if, whenever some event

*E*happened, I could encode the hash codes using H-Δ bits? This would prove that Pr[

*E*]=O(2

^{-Δ}): there are only O(2

^{H-Δ}) bad outcomes that lead to event

*E*, out of 2

^{H}possible ones. Thus, the task of proving a probability upper bound becomes the task of designing an algorithm.

In our case,

*E*={insert(

*x*) traverses a simple path of length

*k*} and we will achieve a saving of Δ=Ω(k). Here is what we put in the encoding:

- one bit, saying whether the path grows from A[h(x)] or B[g(x)];
- the value
*k*, taking O(lg*k*) bits; - all edges of the path, in order, taking (k-1)lg
*n*bits. - all vertices of the path, in order, taking (k+1)lg
*b*bits. - the hash codes for all keys not on the path, in order, taking (n-k)·2lg
*b*bits.

*k*keys on the path are specified using essentially half the information, since a hash code is shared between two edges. Since lg

*n*= lg(b/2) = lg

*b*- 1, the encoding saves Δ=k-O(lg k) bits compared to H.

The intuition for why a

*k*-path occurs with probability 2

^{-Ω(k)}is simple. Say I've reached edge

*y*and I'm on the right side. Then, the probability that B[g(y)] is collision free is at least 1/2, since there are only

*n*keys mapped to a space of 2

*n*. In other words, at each point the path stops with probability half. This is exactly what the encoding is saying: we can save one bit per edge, since it takes lg

*n*to encode an edge, but lg(2n) to encode an endpoint.

**One cycle.**Let us now deal with the case that the connected component of

*x*contains one cycle. It is tempting to say that cuckoo hashing fails in this case, but it does not. Here is what might happen to the update path in case a cycle is part of the component (see figure):

- the path luckily avoids the cycle and finds a free location without intersection itself. Cool.
- the path reaches B[g(x)], which is occupied by some key
*y*. Note that this has closed the cycle through the*x*edge, but the*x*edge is not actually traversed. Following*y*to A[h(y)] must eventually reach a free spot (no more cycles). - the path intersects itself. Then, it will start backtracking, flipping elements back to their position before Insert(x). Eventually, it reaches A[h(x)], where the algorithm had originally placed
*x*. Following the normal cuckoo rules,*x*is moved to B[g(x)] and the exploration from there on must find an empty spot.

*k*with probability 2

^{-Ω(k)}. In cases 1 and 2, we can simply apply the encoding from before, since the path is simple. In case 3, let ℓ be the number of edges until the path meets itself. The number of edges after B[g(x)] is at least k-2ℓ. Thus, we have a simple path from x of length max{ℓ, k-2ℓ} = Ω(k), so the old argument applies.

**Two cycles.**We now arrive at the cases when cuckoo hashing really fails: the bipartite graph contains as a subgraph (1) a cycles with a chord; or (2) two cycles connected by a path (possibly a trivial path, i.e. the cycles simply share a vertex).

From the figure we see that, by removing two edges, we can always turn the bad subgraph into two paths starting at

*x*. We first encode those two paths as above, saving Ω(k), where k=size of the subgraph. Now we can add to the encoding the two infringing edges. For each, we can specify its identity with lg

*n*bits, and its two end points with O(lg k) bits (a lower order loss compared to

the Ω(k) saving). In return, we know their two hash codes, which are worth 4lg

*b*bits. Thus, the overall saving is at least 2lg

*n*bits.

We have shown that an insertion fails with probability O(1/n

^{2}). By a union bound, cuckoo hashing will handle any fixed set of

*n*elements with probability 1-O(1/n).

This bound is actually tight. Indeed, if three keys x,y,z have h(x)=h(y)=h(z) and g(x)=g(y)=g(z), then cuckoo hashing fails (this is the simplest obstruction subgraph). But such a bad event happens with probability (n choose 3)·b

^{2}/ b

^{6}= Θ(1/n).

Posted by Mihai at 9:11 PM 6 comments Links to this post

### Better Guarantees for Chaining and Linear Probing

Last time we showed that chaining and linear probing enjoy O(1) expected running times per operation. However, this guarantee is fairly weak, and certainly does not explain the popularity of the schemes. If you had a computation that was expected to run for 4 hours, would you be happy to hear, say, that it might take 10 hours with 10% probability?**Worst case w.h.p.** So, what bound can be put on the running time that hold with high probability? For chaining, simply apply the Chernoff bound. Given that a bin contains μ=O(1) elements in expectation, the probability that it contain Z elements is at most e^{Z-μ} / (μ/Z)^{Z} = (O(1)/Z)^{Z}. To make this be n^{-c}, set Z=Θ(lg n/lglg n).

Observe that this bound is tight. There are (n choose Z) ≈ n^{Z}/Z! combinations of keys that can land in a designated bin, and some Z keys land there with probability b^{-Z}=(2n)^{-Z}. Thus, in expectation, at least one bin will see Z=Ω(lg n/lglg n) elements.

For linear probing, we can also reduce this to a balls-in-bins analysis, by defining a bin as L=O(lg n) consecutive locations in the array. In the running example b=2n, such a bin is *full* when Z=2μ=L. By Chernoff, this happens with probability e^{Z-μ} / (μ/Z)^{Z} = (e/4)^{L/2}. Thus, the maximal run in O(lg n) w.h.p. Again, this is tight by a direct calculation.**Amortized bounds.** Unfortunately, these worst-case^{1} bounds are not too satisfactory either, since O(lg n) is trivial to get by binary trees. If you grant me some lenience for my modeling, I will prove an O(1) amortized bound w.h.p., which I believe explains the power of the algorithms much better.

Formally, I will prove the following, rather trivial statement:^{1}There is an unfortunate tendency of some TCS papers to use "worst case" when they really mean deterministic. I am much happier with the convention that worst case is opposite of amortized, at least when your paper has any connection to data structures.

Let T≥clg n, for a large enough constantFix the hash codes of the T elements, and define our "bin" to consist of these ≤T distinct hash codes. All other elements are still totally random, and we expect μ≤Tn/b=Θ(lg n) to fall into the bin. If μ≤Z/(2e), the Chernoff bound is ec. Any T operations in chaining hashtables only touch O(T) memory w.h.p.

^{Z-μ}/(μ/Z)

^{Z}≤ 2

^{-μ}= high probability.

But does this actually mean that chaining has O(1) amortized running time? Formally, no: if I repeat a single query T times, the running time will be T times the running time of that query, i.e. a geometric random variable with no good concentration. Here is where I invoke a bit of lenience in my modeling: in practice, it is ridiculous to worry about repeating one of the last O(lg n) operations! The memory used by these recent updates will be fresh in the first level of cache, making a repetition cost essentially zero. (One may formally say that chaining has amortized O(1) running time w.h.p. in the external memory model with a cache of size Ω(lg n).)

A pretty way to understand the amortized bound is as a "buffer size" guarantee. The most demanding applications of hashtables are in analyzing a continuous stream of data, when operations need to be super-fast to keep up with the line speed. In such application, if the design is at all sensible, there will be a buffer between the network interface and our CPU. The goal is not necessarily to take O(1) time for every single operation, but to keep the buffer small. Our proof says that the buffer will not grow to more than T=O(lg n) w.h.p., if you can afford the average time/operation.

For linear probing, we can instead show:

Let T=Ω(lgRemember from last time that we analyzed linear probing by building a binary tree over the array, and bounding the number of nodes that become dangerous (two-thirds full).^{3}n). Any T operations in chaining hashtables only touch O(T) memory w.h.p.

Let μ be the number of keys we expect under some node. First of all, if μ≫lg

*n*, we do not need to worry about the node: it doesn't become dangerous w.h.p. Otherwise, we showed that the node becomes dangerous with probability O(1/μ

^{2}); if it does, we will pay a cost of μ.

Looking at T elements, I am dealing with ≤T nodes on each level, and I expect O(T/μ

^{2}) nodes to be dangerous. As long as T/μ

^{2}≥

*c*lg

*n*, Chernoff tells me that only O(T/μ

^{2}) nodes are dangerous w.h.p. Since we only deal with μ=O(lg n), I needed to set T=Ω(lg

^{3}n). With this bound, the number of memory locations accessed by the T operations is Σ

_{μ=2^i}O(T/μ

^{2})·μ = O(T) w.h.p.

I end with a question to my knowledgeable readers. By being more careful, I can prove T=Ω(lg

^{1+ε}n) suffices for linear probing. Is it possible to prove T=Ω(lg n), as in chaining? (Perhaps I'm missing something obvious.)

Posted by Mihai at 8:55 PM 4 comments Links to this post