## Tuesday, August 7, 2007

### Binary Search Trees (II): Lower Bounds

In the first post on Binary Search Trees (BSTs), I talked about a geometric view of the following optimization problem: given an access sequence (x1, .., xm), what is the best running time achievable by a dynamic BST on this sequence?

The typical approach to get an approximation algorithm is to identify a lower bound on the cost, and match it as close as possible. Traditionally, we have known two lower bounds for the BST problem, coming from [Robert E. Wilber: Lower Bounds for Accessing Binary Search Trees with Rotations. SICOMP 18(1): 56-67, 1989; FOCS'86].

In the rank-time geometric view, these two bounds and any other bounds we have been able to think of fall in a framework that we call cut bounds. Definitions:

• a cut is a vertical segment from some (x,y) to (x,Y), say y < Y.

• a cut system is an ordered list of cuts C1, C2, ... It will follow that one can assume w.l.o.g. that cuts are ordered by increasing Y, breaking ties by increasing x.

• for some cut Ci defined by (xi,yi)--(xi,Yi) a point (x, y) is in range of the cut if yiy < Yi.

• the point (x, y) is accessible to the cut Ci if it is in range of the cut, and for every cut Cj, j < i, that the point is in range of, the point is between Ci and Cj, i.e. min{xi, xj} ≤ x ≤ max{xi, xj}. In the example below, any point in the hashed area is inaccessible to C7.
• for every Ci, sort all accessible points of the access sequence by y coordinate. Let Li be the numbers of transitions in this sorted list between points on the left of Ci and points on the right. In the example, L7 = 4:
Theorem: Cost of best BST = Omega(sum Li)
Proof: There is a very meaningful proof, to be discussed in a later post. --QED

Thus, the lower bound proven by some cut system is sum Li. Wilber I is obtained by a cut system which is independent of the access sequence (discussion in a future post). Wilber II is obtained by considering a cut starting at every point, and going up to -∞. If you think about it, for each point (=cut), it counts the number of left-right transitions in a sequence of points converging towards it, when looking up (back in time):This is annoyingly close (but not quite) to the greedy IAN. The cost of the greedy is essentially equal to the total number of points in a converging sequence, while the lower bound is equal to the number of alternations. Despite the similarity, we do not know any bound on the ratio of this upper and lower bound, and we do not know any separation.

The optimal cut bound. Since all lower bounds we know are cut bounds, the obvious question is to find the best bound achievable in this framework. For a given access sequence, define the following undirected graph:
• the nodes are the unsatisfied rectangles (rectangles with two access points at opposite corners, containing no other point)

• there is an edge between two rectangles iff one contains in its interior a corner of the other. Note this corner must be empty, otherwise the rectangle is satisfied and is not a node. Thus, there are only two possible cases for rectangles joined by an edge:We call the rectangles on the left positive-type and those on the right negative-type.
Theorem: The highest possible cut bound = the maximum independent set in the graph of rectangles.

Proof of
"": For any cut bound, we find an independent set of the same size: take every transition across a cut, and draw the unsatisfied rectangle defined by the points on the left and right. (5 minutes with pencil and paper will convince you the rectangle are independent)

Proof of
"": For any independent set, we need to construct a cut giving the same bound. For every rectangle, consider a cut with the same vertical range, that is close to the left edge for negative-type rectangle, and close to the right edge for positive-type. Order cuts by increasing Y, breaking ties by increasing x. Each cut will have a lower bound of exactly 1, paying for the rectangle (5 more minutes with pencil and paper). --QED

Note a very pleasing thing about the maximum independent set: it is flip- and rotation-invariant. Rectangle satisfiability is by definition flip- and rotation-invariant, so the optimal BST must have this property. Thus, whatever the right lower bound is, it must have this property also.

Now, it does not seem too useful to know that the best lower bound is a maximum independent set of some geometric graph, since that is both too hard to compute and too difficult to reason about. We have a neat greedy that approximates the maximum independent set within constant factors (it is similar to the IAN greedy, but taylored to this problem). Thus, the best lower bound is rather explicit. Unfortunately, we cannot construct a mathcing upper bound.

Open problems:
• can we come up with any non-cut lower bounds? (that are better?)
• is Wilber II asymptotically equal to the best bound (the maximum independent set)? Wilber conjectured so.
• does IAN come anywhere close to the best lower bound? is there a separation?