Thursday, August 9, 2007

Binary Search Trees (III): Online

So far I have talked about the following optimization problem: given a sequence of accesses x=(x1,..,xm), compute the minimum cost needed by a dynamic BST to access those elements. This is an interesting enough problem (somewhat unusually, it is a Computer Science problem about Computer Science), and we should strive to solve it.

But the study of BSTs would not end with solving this problem: step 2 would be to get an online BST algorithm. This is the kind of "binary search trees" that we're all acustomed to: the algorithm gets an access request, does something, gets the next access, does something etc. Many, many online BST algorithms have been proposed, like red-black trees, AVL, 2-3 trees, BB[alpha] etc. Most of these only guarantee that the worst-case access takes O(lg n), i.e. they only try to keep the tree balanced.

The goal would be to not only solve the philosophical question of approximating the best possible cost for a given access sequence, but have an actual real tree that we can use, which always takes time close to the minimum possible cost.

Competitive analysis (background for nontheorists). Formally, our new problem is the field of online algorithms and competitive analysis. A competitive ratio of c means that for any access sequence x, your online algorithm (which sees accesses one at a time) does at most c times worse than the best possible cost for the sequence x. In other words, a c-competitive algorithm is in particular a c-approximation algorithm, with the bonus that it handles each access without knowing the future.

Note that it is not at all clear that competitive algorithms exist; if you don't know the future, what chance do you have to compete with someone who does? Competitive analysis is a well-studied field; for some problems good competitive algorithms exist, and for others we have impossibility results.

The metaphor for this field is the snowboard rental problem. Snowboards cost $30 to buy, and $1/day to rent. You do not know how many days of snow you'll have this year. What do you do? [Answer: rent for 30 days, then buy if there's still snow on the 31st day. If you are a little green man with perfect whether forecast, you pay min{D,30}, where D=number of days with snow. By this algorithm, you pay at most 2*min{D,30}, i.e. you are 2-competitive. More insight.]
Competitive BSTs. The obvious bound is O(lg n)-competitiveness, since the worst-case is logarithmic for any balanced tree. How about something better?
  • splay trees [Sleator, Tarjan: Self-Adjusting Binary Search Trees, J. ACM 1985; STOC'83] are conjectured to be O(1)-competitive, but nobody has shown any o(lg n) bound. More in post IV.

  • the IAN greedy is also conjectured to be O(1)-competitive but nothing is known (not even a logarithmic bound). See below.

  • Tango trees [Demaine, Harmon, Iacono, Patrascu: Dynamic Optimality--Almost, SICOMP 2007; FOCS'04] give the first proved result: O(lglg n)-competitiveness. They are also the best known approximation algorithm for the offline problem. See post V.
Historically, O(1)-competitiveness is called "dynamic optimality". The "dynamic optimality conjecture" claims that splay trees in particular have this property.

Is IAN Online?? Good question. Both in Ian Munro's paper, and in an older paper that introduced the algorithm independently [Joan Marie Lucas, Canonical forms for competitive binary search tree algorithms, Rutgers technical report DCS-TR-250], it is an offline algorithm called "order by next request". Since you are reordering the elements considering future accesses, this is really offline. However, by the black magic of interpreting the algorithm as a greedy in the geometric view, and a more complicated conversion back from the geometric problem into a tree, we can show that the algorithm can be made online with O(1) slowdown. This is surprising: the old conjecture that IAN is an O(1)-approximation algorithm is actually a conjecture that it is dynamically optimal.

So there is another contender for the dynamic optimality title, competing with splay trees.

11 comments:

Anonymous said...

I know this post is quite old but I hope someone can help me on this one. I am trying to teach myself about this Online Binary Search topic and I have read a few papers on this topic and I am familiar with the fundamental concept about BST but it is my first time to focus on topics like competitive analysis and online algorithms. I am a little lost on what does it mean by "nobody has shown any o(lg n) bound" in your post about Splay trees? My best guess on this is that it means that nobody has shown that there is an offline BST with an OPT(X) = o(lg n)? I hope someone can help me understand the problem. Thanks.

Mihai said...

Actually, no. The ultimate goal is to show that for any access sequence X, the cost of splay trees on X is O(OPT on X). For some X, OPT(X) = |X|*lg n, or course, and no tree can achieve o(lg n) in the worst case (a tree has branching factor 2 and must reach an arbitrary key in a set of n keys).

Now if we cannot prove that (\forall X) SPLAYcost(X) \le O(1)* OPTcost(X), we might want to replace the O(1) with some other small function. O(lg n) is trivial here since OPT must have cost at least 1 per access, and splay trees do achieve an O(lg n) upper bound (as do many mane other trees). But the gap between O(lgn) and the conjectured O(1) is open.

Anonymous said...

Thank you for your immediate response on my question, though I still find things a little confusing. So it means that there is no single value or function that would represent OPTcost(X)?

Mihai said...

In fact, computing OPT(X) for a given X might easily be NP-hard (though I am not aware of a proof, there have been efforts to show this).

Mihai said...

Consider two simple examples:
1. If X is random, OPT(X) = |X|lg n.
2. If X= (1,2,3, ..., n), OPT(X) = O(n).

This shows that OPT(.) is a nontrivial function.

Anonymous said...

If that is so, then how can is it possible to know the competitive ratio/competitiveness of a given tree? Because I have read something about competitive ratio (c) that states that c(A) = (\forall X) max{costA(X)/costOPT(X)}. If costOPT(X) cannot easily be computed then how can I come up with the competitiveness of A?

Anonymous said...

If that is so, then how can is it possible to know the competitive ratio/competitiveness of a given tree? Because I have read something about competitive ratio (c) that states that c(A) = (\forall X) max{costA(X)/costOPT(X)}. If costOPT(X) cannot easily be computed then how can I come up with the competitiveness of A?

Mihai said...

We only want to determine competitivness asymptotically. So if OPT(X) could be approximated up to a constant, there is no problem.

Anonymous said...

Thank you for your time, I will continue reading more about this topic. And I also find the geometric interpretation of BST execution interesting though I cannot say I totally grasp everything. Just one last question, where does the Wilbur I and II bound fit in the picture?

Mihai said...

Wilber 1 and 2 are lower bounds on OPT, i.e. (\forall X) OPT(X) >= max ( Wilber1(X), Wilber2(X)).

The point is that OPT is very hard to understand (maybe even NP-hard), whereas Wilber1&2 have simple Mathematical descriptions and there are linear-time algorithms to compute them.

Anonymous said...

Hi mihai, i have read your paper about the geometric view of BST and see how the offline algo greedyfuture becomes online in the geometric view. however my concern is the initial configuration of the tree. it seems that greedyASS executes query base on the initial configuration set by greedyfuture. for example, greedyfuture will always arrange its initial tree such that the first query will be for the root and greedyASS take advantage of this. but this would mean that greedyASS makes some assumption of future query and might not be totaly online? i hope you can answer this mihai. thanks.