Thursday, October 11, 2007

Range Queries (I)

I am now in Bonn, spending some time talking to Yakov Nekritch about range queries. Since this is a field I care about a lot, I will write a series of posts on it.

First, let me begin with my spiel on the importance of studying range queries. This is taken from the beginning of my talk at UW in May.

Once upon a time, you go to a bar in NY and you're trying to pick up a nice woman. She tells you she works in HR, and you tell her you are a theoretical computer scientist. She confesses she has not idea what that means, at which point you clarify "Oh, I am trying to understand the fundamental nature of computation". The blank stare you get is a clear indication that some more clarification is needed.

What do you then explain at this crucial moment of the night? Though I have not tried the following options myself, the prevailing opinion is that they are ill-advised:

  • I am trying to prove lower bounds for constant depth circuits with mod-6 gates.
  • I am trying to extract pure randomness from 2 independent sources with small min entropy.
  • I am trying to design codes of subexponential size with efficient local decodability.
  • I am trying to show that a diagonalization argument cannot separate P and NP, even if it is given a low degree extension of an oracle.
One can complain that too few people understand what computing and computation actually mean. That is a valid complaint, and we should indeed work with precollege education to change things.

But here you're in luck. The gorgeous HR woman actually uses computers every day, and she knows exactly what they're supposed to do. Chances are, she even has some exposure to a programming language! That is of course SQL, which is very useful for her work. The only trouble is -- for her, and for most of the planet, computers are a great tool for data storage and manipulation, not a great tool for finding Hamiltonian paths.

So what do you tell her? Anybody who has attended 5 minutes of a SQL tutorial has seen examples like:
SELECT * FROM employees
WHERE salary <= 70000 AND startdate <= 1998
In fact, she probably runs code like this on a daily basis. Thus, she already understands orthogonal range queries -- formally, you are given a set of points, and you ask for points in some rectangle.

You can now explain that your work is about how fast such queries can be executed, and, with her curiosity satisfied, move to more charming topics.

PS: This post is not about bad-mouthing complexity theory. For a long time, I have considered myself a complexity theorist, and I am interested in many topics in the field, well beyond my own research. The funny references are to some very particular young-and-eager complexity theorists, who have occasionally exasperated me with comments like "Yeap, data structures can be interesting sometimes, but really our goal is to understand computation". The point, if it was not clear enough, is that data structures are certainly not more artificial as our other computational questions -- maybe most computational questions that computer users care about are data-structure (database) questions.


AC said...

Mihai, don't you have a SODA paper to finalize? :)


rgrig said...

Looking forward to the next episode :)

rgrig said...

Is there a survey on this subject that you would recommend?