Today, we will prove bounds for the basic cuckoo hashing. We are going to place our *n* keys into two arrays A[1..b] and B[1..b], where b=2n. For this, we will use two hash functions, *h* and *g*. When some element *x* arrives, we first try to put it in A[h(x)]. If that location already contains some element *y*, try to move *y* to B[g(y)]. If that location already contains some *z*, try to move *z* to A[h(z)], and so on until you find a free spot.

The proper image to have in mind is that of a random bipartite graph. The graph will have *b* nodes on each side, corresponding to the locations of A and B. In this view, a key *x* is an edge from the left vertex h(x) to the right vertex g(x).**Simple paths.** As a warm-up, let's deal with the case of simple paths: upon inserting *x*, the update path finds an empty spot without intersecting itself. It turns out that the update time of cuckoo hashing behaves like a geometric random variable:

The probability that insert(I will prove this by a cute encoding analysis (you know I like encoding proofs). Let's say you want to encode the two hash codes for each of thex) traverses a simple path of lengthkis 2^{-Ω(k)}.

*n*keys. As the hash functions

*h*and

*g*are truly random, this requires H=2

*n*lg

*b*bits on average (the entropy). But what if, whenever some event

*E*happened, I could encode the hash codes using H-Δ bits? This would prove that Pr[

*E*]=O(2

^{-Δ}): there are only O(2

^{H-Δ}) bad outcomes that lead to event

*E*, out of 2

^{H}possible ones. Thus, the task of proving a probability upper bound becomes the task of designing an algorithm.

In our case,

*E*={insert(

*x*) traverses a simple path of length

*k*} and we will achieve a saving of Δ=Ω(k). Here is what we put in the encoding:

- one bit, saying whether the path grows from A[h(x)] or B[g(x)];
- the value
*k*, taking O(lg*k*) bits; - all edges of the path, in order, taking (k-1)lg
*n*bits. - all vertices of the path, in order, taking (k+1)lg
*b*bits. - the hash codes for all keys not on the path, in order, taking (n-k)·2lg
*b*bits.

*k*keys on the path are specified using essentially half the information, since a hash code is shared between two edges. Since lg

*n*= lg(b/2) = lg

*b*- 1, the encoding saves Δ=k-O(lg k) bits compared to H.

The intuition for why a

*k*-path occurs with probability 2

^{-Ω(k)}is simple. Say I've reached edge

*y*and I'm on the right side. Then, the probability that B[g(y)] is collision free is at least 1/2, since there are only

*n*keys mapped to a space of 2

*n*. In other words, at each point the path stops with probability half. This is exactly what the encoding is saying: we can save one bit per edge, since it takes lg

*n*to encode an edge, but lg(2n) to encode an endpoint.

**One cycle.**Let us now deal with the case that the connected component of

*x*contains one cycle. It is tempting to say that cuckoo hashing fails in this case, but it does not. Here is what might happen to the update path in case a cycle is part of the component (see figure):

- the path luckily avoids the cycle and finds a free location without intersection itself. Cool.
- the path reaches B[g(x)], which is occupied by some key
*y*. Note that this has closed the cycle through the*x*edge, but the*x*edge is not actually traversed. Following*y*to A[h(y)] must eventually reach a free spot (no more cycles). - the path intersects itself. Then, it will start backtracking, flipping elements back to their position before Insert(x). Eventually, it reaches A[h(x)], where the algorithm had originally placed
*x*. Following the normal cuckoo rules,*x*is moved to B[g(x)] and the exploration from there on must find an empty spot.

*k*with probability 2

^{-Ω(k)}. In cases 1 and 2, we can simply apply the encoding from before, since the path is simple. In case 3, let ℓ be the number of edges until the path meets itself. The number of edges after B[g(x)] is at least k-2ℓ. Thus, we have a simple path from x of length max{ℓ, k-2ℓ} = Ω(k), so the old argument applies.

**Two cycles.**We now arrive at the cases when cuckoo hashing really fails: the bipartite graph contains as a subgraph (1) a cycles with a chord; or (2) two cycles connected by a path (possibly a trivial path, i.e. the cycles simply share a vertex).

From the figure we see that, by removing two edges, we can always turn the bad subgraph into two paths starting at

*x*. We first encode those two paths as above, saving Ω(k), where k=size of the subgraph. Now we can add to the encoding the two infringing edges. For each, we can specify its identity with lg

*n*bits, and its two end points with O(lg k) bits (a lower order loss compared to

the Ω(k) saving). In return, we know their two hash codes, which are worth 4lg

*b*bits. Thus, the overall saving is at least 2lg

*n*bits.

We have shown that an insertion fails with probability O(1/n

^{2}). By a union bound, cuckoo hashing will handle any fixed set of

*n*elements with probability 1-O(1/n).

This bound is actually tight. Indeed, if three keys x,y,z have h(x)=h(y)=h(z) and g(x)=g(y)=g(z), then cuckoo hashing fails (this is the simplest obstruction subgraph). But such a bad event happens with probability (n choose 3)·b

^{2}/ b

^{6}= Θ(1/n).

## 6 comments:

This is a nice idea that looks like another application of "Moser's Method," but there's something I don't quite get.

In your encoding proof, H is entropy of the bitstring h(x_1),g(x_1),...,h(x_n),g(x_n).

Can you recover this bitstring from the encoding? As far as I can see, you can discover which edges are in the Cuckoo graph, but not which keys generated those edges. Or did I miss something?

I haven't looked at Moser's paper, but, given the blog-hype it received, I sincerely hope he does something smarter than just expressing probabilities in encoding terminology. This is a trivial idea that I presume many people had.

As for the encoding, an edge is identical to a key. In other words, I may assume that the keys are {1,...,n}. You notice how I always say that I am encoding an edge with lg n bits.

To put it in your terminology, there are n edges h(x_i)--g(x_i). I can encode which edge is involved in some subgraph using lg n bits. So I do indeed recover the h and g values for all edges.

Ah, ok. I see it now. This is the part I missed:

all edges of the path, in order, taking (k-1)lg n bitsI didn't understand that this gives you the key that generated each edge, so you can fully recover the functions g and h.

Thanks,

Pat

By the way Moser's proof is essentially an encoding proof. It basically shows that, after some startup cost, each step of the algorithm compresses k random bits into log(dk) bits. So as long as log(dk) < k, there had better not be too many steps.

Glad you figured it out!

Again, I believe Moser's paper got its hype because it did something smart in its handling of the Lovász lemma (which I don't understand, since I didn't think about this problem -- which is probably also the case for 90% of the hype-generators, but that's another story :). It's certainly not the case that this paper is the root of all encoding proofs :)

Post a Comment