Tuesday, September 21, 2010

Static 1D Range Reporting

Method 4 for implementing van Emde Boas with linear space, described in my last post, is due to [Alstrup, Brodal, Rauhe: STOC'01]. They worked on static range reporting in 1 dimension: preprocess a set of integers S, and answer query(a,b) = report all points in S ∩ [a,b]. This is easier than predecessor search: you can first find the predecessor of a and then output points in order until you exceed b. Using van Emde Boas, we would achieve a linear-space data structure with query time O(lglg u + k), where k is the number of points to be reported.


Alstrup, Brodal, and Rauhe showed the following surprising result:
Static 1D range reporting can be solved with O(n) space and O(1+k) query time.
I like this theorem a lot, since it is so easy to describe to anybody with minimal background in Computer Science, yet the result is not obvious. I have used it many times to answer questions like, "Tell me a surprising recent result from data structures."

The solution. We need a way to find some (arbitrary) key from S ∩ [a,b] in constant time. Once we have that, we can walk left and right in an ordered list until we go outside the interval.

Let's first see how to do this with O(n lg u) space; this was described by [Miltersen, Nisan, Safra, Wigderson: STOC'95]. Of course, we build the trie representing the set. Given the query [a,b] let us look at the lowest common ancestor (LCA) of a and b. Note that LCA(a,b) is a simple mathematical function of the integers a and b, and can be computed in constant time. (The height of the LCA is the most significant set bit in a xor b.)
  • if LCA(a,b) is a branching node, look at the two descendant branching nodes. If the interval [a,b] is nonempty, it must contain either the max in the tree of the left child, or the min in the tree of the right child.

  • if LCA(a,b) is an active node, go to its lowest branching ancestor, and do something like the the above.

  • if LCA(a,b) is not an active node, the interval [a,b] is certainly empty!
Thus, it suffices to find the lowest branching ancestor of LCA(a,b) assuming that LCA(a,b) is active. This is significantly easier than predecessor search, which needs the lowest branching ancestor of an arbitrary node.

The proposal of [Miltersen et al.] is to store all O(n lg u) active nodes in a hash table, with pointers to their lowest branching ancestors.

As in my last post, the technique of [Alstrup et al.] to achieve O(n) space is: store only O(nlg u) active nodes, and store them in a retrieval-only dictionary with O(lglg u) bits per node. We store the following active nodes:
  1. active nodes at depth i·√lg u ;
  2. active nodes less than √lg u levels below a branching node.
We first look up LCA(a,b) in the dictionary. If the lowest branching ancestor is less than √lg u levels above, LCA(a,b) is in the dictionary and we find the ancestor. If not, we truncate the depth of the LCA to a multiple of √lg u , and look up the ancestor at that depth. If [a,b] is nonempty, that ancestor must be an active node and it will point us to a branching ancestor.

No comments: