Last time we showed that chaining and linear probing enjoy O(1) expected running times per operation. However, this guarantee is fairly weak, and certainly does not explain the popularity of the schemes. If you had a computation that was expected to run for 4 hours, would you be happy to hear, say, that it might take 10 hours with 10% probability?
Worst case w.h.p. So, what bound can be put on the running time that hold with high probability? For chaining, simply apply the Chernoff bound. Given that a bin contains μ=O(1) elements in expectation, the probability that it contain Z elements is at most eZ-μ / (μ/Z)Z = (O(1)/Z)Z. To make this be n-c, set Z=Θ(lg n/lglg n).
Observe that this bound is tight. There are (n choose Z) ≈ nZ/Z! combinations of keys that can land in a designated bin, and some Z keys land there with probability b-Z=(2n)-Z. Thus, in expectation, at least one bin will see Z=Ω(lg n/lglg n) elements.
For linear probing, we can also reduce this to a balls-in-bins analysis, by defining a bin as L=O(lg n) consecutive locations in the array. In the running example b=2n, such a bin is full when Z=2μ=L. By Chernoff, this happens with probability eZ-μ / (μ/Z)Z = (e/4)L/2. Thus, the maximal run in O(lg n) w.h.p. Again, this is tight by a direct calculation.
Amortized bounds. Unfortunately, these worst-case1 bounds are not too satisfactory either, since O(lg n) is trivial to get by binary trees. If you grant me some lenience for my modeling, I will prove an O(1) amortized bound w.h.p., which I believe explains the power of the algorithms much better.
1 There is an unfortunate tendency of some TCS papers to use "worst case" when they really mean deterministic. I am much happier with the convention that worst case is opposite of amortized, at least when your paper has any connection to data structures.Formally, I will prove the following, rather trivial statement:
Let T≥clg n, for a large enough constant c. Any T operations in chaining hashtables only touch O(T) memory w.h.p.Fix the hash codes of the T elements, and define our "bin" to consist of these ≤T distinct hash codes. All other elements are still totally random, and we expect μ≤Tn/b=Θ(lg n) to fall into the bin. If μ≤Z/(2e), the Chernoff bound is eZ-μ/(μ/Z)Z ≤ 2-μ = high probability.
But does this actually mean that chaining has O(1) amortized running time? Formally, no: if I repeat a single query T times, the running time will be T times the running time of that query, i.e. a geometric random variable with no good concentration. Here is where I invoke a bit of lenience in my modeling: in practice, it is ridiculous to worry about repeating one of the last O(lg n) operations! The memory used by these recent updates will be fresh in the first level of cache, making a repetition cost essentially zero. (One may formally say that chaining has amortized O(1) running time w.h.p. in the external memory model with a cache of size Ω(lg n).)
A pretty way to understand the amortized bound is as a "buffer size" guarantee. The most demanding applications of hashtables are in analyzing a continuous stream of data, when operations need to be super-fast to keep up with the line speed. In such application, if the design is at all sensible, there will be a buffer between the network interface and our CPU. The goal is not necessarily to take O(1) time for every single operation, but to keep the buffer small. Our proof says that the buffer will not grow to more than T=O(lg n) w.h.p., if you can afford the average time/operation.
For linear probing, we can instead show:
Let T=Ω(lg3n). Any T operations in chaining hashtables only touch O(T) memory w.h.p.Remember from last time that we analyzed linear probing by building a binary tree over the array, and bounding the number of nodes that become dangerous (two-thirds full).
Let μ be the number of keys we expect under some node. First of all, if μ≫lg n, we do not need to worry about the node: it doesn't become dangerous w.h.p. Otherwise, we showed that the node becomes dangerous with probability O(1/μ2); if it does, we will pay a cost of μ.
Looking at T elements, I am dealing with ≤T nodes on each level, and I expect O(T/μ2) nodes to be dangerous. As long as T/μ2 ≥ clg n, Chernoff tells me that only O(T/μ2) nodes are dangerous w.h.p. Since we only deal with μ=O(lg n), I needed to set T=Ω(lg3n). With this bound, the number of memory locations accessed by the T operations is Σμ=2^i O(T/μ2)·μ = O(T) w.h.p.
I end with a question to my knowledgeable readers. By being more careful, I can prove T=Ω(lg1+εn) suffices for linear probing. Is it possible to prove T=Ω(lg n), as in chaining? (Perhaps I'm missing something obvious.)