Tuesday, February 2, 2010

Cuckoo Hashing

Today, we will prove bounds for the basic cuckoo hashing. We are going to place our n keys into two arrays A[1..b] and B[1..b], where b=2n. For this, we will use two hash functions, h and g. When some element x arrives, we first try to put it in A[h(x)]. If that location already contains some element y, try to move y to B[g(y)]. If that location already contains some z, try to move z to A[h(z)], and so on until you find a free spot.

The proper image to have in mind is that of a random bipartite graph. The graph will have b nodes on each side, corresponding to the locations of A and B. In this view, a key x is an edge from the left vertex h(x) to the right vertex g(x).

Simple paths. As a warm-up, let's deal with the case of simple paths: upon inserting x, the update path finds an empty spot without intersecting itself. It turns out that the update time of cuckoo hashing behaves like a geometric random variable:

The probability that insert(x) traverses a simple path of length k is 2-Ω(k).
I will prove this by a cute encoding analysis (you know I like encoding proofs). Let's say you want to encode the two hash codes for each of the n keys. As the hash functions h and g are truly random, this requires H=2nlg b bits on average (the entropy). But what if, whenever some event E happened, I could encode the hash codes using H-Δ bits? This would prove that Pr[E]=O(2): there are only O(2H-Δ) bad outcomes that lead to event E, out of 2H possible ones. Thus, the task of proving a probability upper bound becomes the task of designing an algorithm.

In our case, E={insert(x) traverses a simple path of length k} and we will achieve a saving of Δ=Ω(k). Here is what we put in the encoding:
  • one bit, saying whether the path grows from A[h(x)] or B[g(x)];
  • the value k, taking O(lg k) bits;
  • all edges of the path, in order, taking (k-1)lg n bits.
  • all vertices of the path, in order, taking (k+1)lg b bits.
  • the hash codes for all keys not on the path, in order, taking (n-k)·2lg b bits.
Observe that the hash codes of the k keys on the path are specified using essentially half the information, since a hash code is shared between two edges. Since lg n = lg(b/2) = lg b - 1, the encoding saves Δ=k-O(lg k) bits compared to H.

The intuition for why a k-path occurs with probability 2-Ω(k) is simple. Say I've reached edge y and I'm on the right side. Then, the probability that B[g(y)] is collision free is at least 1/2, since there are only n keys mapped to a space of 2n. In other words, at each point the path stops with probability half. This is exactly what the encoding is saying: we can save one bit per edge, since it takes lgn to encode an edge, but lg(2n) to encode an endpoint.

One cycle. Let us now deal with the case that the connected component of x contains one cycle. It is tempting to say that cuckoo hashing fails in this case, but it does not. Here is what might happen to the update path in case a cycle is part of the component (see figure):
  1. the path luckily avoids the cycle and finds a free location without intersection itself. Cool.
  2. the path reaches B[g(x)], which is occupied by some key y. Note that this has closed the cycle through the x edge, but the x edge is not actually traversed. Following y to A[h(y)] must eventually reach a free spot (no more cycles).
  3. the path intersects itself. Then, it will start backtracking, flipping elements back to their position before Insert(x). Eventually, it reaches A[h(x)], where the algorithm had originally placed x. Following the normal cuckoo rules, x is moved to B[g(x)] and the exploration from there on must find an empty spot.
As before, we want to show that an update takes time k with probability 2-Ω(k). In cases 1 and 2, we can simply apply the encoding from before, since the path is simple. In case 3, let ℓ be the number of edges until the path meets itself. The number of edges after B[g(x)] is at least k-2ℓ. Thus, we have a simple path from x of length max{ℓ, k-2ℓ} = Ω(k), so the old argument applies.

Two cycles. We now arrive at the cases when cuckoo hashing really fails: the bipartite graph contains as a subgraph (1) a cycles with a chord; or (2) two cycles connected by a path (possibly a trivial path, i.e. the cycles simply share a vertex).
From the figure we see that, by removing two edges, we can always turn the bad subgraph into two paths starting at x. We first encode those two paths as above, saving Ω(k), where k=size of the subgraph. Now we can add to the encoding the two infringing edges. For each, we can specify its identity with lg n bits, and its two end points with O(lg k) bits (a lower order loss compared to
the Ω(k) saving). In return, we know their two hash codes, which are worth 4lg b bits. Thus, the overall saving is at least 2lg n bits.

We have shown that an insertion fails with probability O(1/n2). By a union bound, cuckoo hashing will handle any fixed set of n elements with probability 1-O(1/n).

This bound is actually tight. Indeed, if three keys x,y,z have h(x)=h(y)=h(z) and g(x)=g(y)=g(z), then cuckoo hashing fails (this is the simplest obstruction subgraph). But such a bad event happens with probability (n choose 3)·b2 / b6 = Θ(1/n).

6 comments:

Pat Morin said...

This is a nice idea that looks like another application of "Moser's Method," but there's something I don't quite get.

In your encoding proof, H is entropy of the bitstring h(x_1),g(x_1),...,h(x_n),g(x_n).

Can you recover this bitstring from the encoding? As far as I can see, you can discover which edges are in the Cuckoo graph, but not which keys generated those edges. Or did I miss something?

Mihai said...

I haven't looked at Moser's paper, but, given the blog-hype it received, I sincerely hope he does something smarter than just expressing probabilities in encoding terminology. This is a trivial idea that I presume many people had.

As for the encoding, an edge is identical to a key. In other words, I may assume that the keys are {1,...,n}. You notice how I always say that I am encoding an edge with lg n bits.

To put it in your terminology, there are n edges h(x_i)--g(x_i). I can encode which edge is involved in some subgraph using lg n bits. So I do indeed recover the h and g values for all edges.

Pat Morin said...
This comment has been removed by the author.
Pat Morin said...

Ah, ok. I see it now. This is the part I missed:

all edges of the path, in order, taking (k-1)lg n bits

I didn't understand that this gives you the key that generated each edge, so you can fully recover the functions g and h.

Thanks,
Pat

Pat Morin said...

By the way Moser's proof is essentially an encoding proof. It basically shows that, after some startup cost, each step of the algorithm compresses k random bits into log(dk) bits. So as long as log(dk) < k, there had better not be too many steps.

Mihai said...

Glad you figured it out!

Again, I believe Moser's paper got its hype because it did something smart in its handling of the Lovász lemma (which I don't understand, since I didn't think about this problem -- which is probably also the case for 90% of the hype-generators, but that's another story :). It's certainly not the case that this paper is the root of all encoding proofs :)