Thursday, August 30, 2007

Love thy predecessor (I): In the beginning was the Word

Thinking about the predecessor problem is the CS-equivalent of a journey within, when you rediscover the very axioms of your existence. It is a center of the data-structure universe, yet it is so ingrained in us that few of us take the time to notice the problem exists. From this central, well-studied location, it shines a light on many controversies and research angles in the field.

The problem. Preprocess a set S of n numbers, such that the following query is answered efficiently: Given a number x, return predS(x) = max { yx | y in S }.

The most trivial solution is of course to sort S during preprocessing, and use binary search to answer queries. If we want a dynamic S, we can use a binary search tree instead. In fact, if somebody asks you what problem binary search trees solve, predecessor search is what you probably think of first.

The problem can be thought of as the online version of sorting (insert one number in a sorted list). It has countless small applications everywhere, plus some rather notable ones -- more later. A lot of very interesting mathematical understanding has been developed around the problem.

But before delving deeper, we have an interesting topic of discussion if we only consider the problem name...

The name. Of course, some people persistently call this "the successor problem". Whatever :) The real issue is that some people use "binary search" or "binary search tree" when they should say predecessor search. For example, it is popular to specify steps of algorithms like "... now binary search for this value in the set".

This is evil! Let us remind ourselves that we study:

• the dictionary problem, not hash tables.
• the predecessor problem, not binary search (trees).
• the nearest neighbor problem, not k-d trees.
• etc
The Book was written before time, and does not contain references to our meager inventions. Its chapters are dedicated to problems, and solving the problems (through one data structure or another) is what we're supposed to do. Studying some particular data structure we invented is missing the forest for the trees. Not the mention that it is a clear act of vanity, and after all superbia is the very first in SALIGIA.

I am always disappointed by programmers who don't understand the difference, and I tend to jump at the neck of academics who make the confusion :)

To give a recent example, a machine-learning expert was presenting a clustering algorithm in the terms "... and now you use the k-d tree to search for the nearest neighbor to this point...". With this mentality, it is understandable why the sport of adding heuristics to k-d trees is more popular than searching for a more efficient solution to the problem with a less myopic lens.

PS: This whole "misbranding" reminds me of Romania, where "sport shoes" are called "adidasi", "copy machines" are "xeroxuri", and so on. This doesn't seem to happen so much in English, but I'm sure there are examples. I look forward to hearing "I'll google it up on Yahoo".

Anonymous said...

Reminds me of Israel as well.

The historical reason is that while Americans, say, had lots of companies selling refrigerators, we only had fridgeders, because that's the only brand they would bring. There are better examples out there, but I can't seem to recall them.

It's the remnants of a monopolistic and not-heavily-consuming society.

Unknown said...

I look forward to hearing "I'll google it up on Yahoo".

Huh? This is already very common (only without the "up" or the "on Yahoo")!

See also: frisbee, kleenex, band-aid, philips screwdriver, Levi's, zipper, fiberglass, jello, dry ice, coke, LP, Q-tip, xerox, rolodex, scotch tape, muzak, Rubik's cube, superglue, and tivo. For more examples, google "genericized trademark".

Also, among programmers educated in the 1990s: "CLR[S]" instead of "algorithms textbook".

Mihai said...

Well, "to google something" is a common expression, but it has truly entered vocabulary if people stop using it in association to Google (for example, "google it on Yahoo!" would be proof enough that this has happenned).

rgrig said...

I've always thought that binary search is a way to search in a set with comparable things. Not to search a predecessor. The second seems more specific. But I'm waiting for the sequel.

Mihai said...

rgrig, you mean binary search can solve the dictionary problem (exact search). Sure, but that's not very informative since everybody agrees hashing is better.