Tuesday, November 24, 2009
Tuesday, November 10, 2009
A Simple Encoding Proof
In this post, I discuss a nice and simple example of an encoding proof, showing that maintaining partial sums require Ω(lg n) time per operation.
Go ahead and read the lecture notes in [PDF]. Or, if you prefer a more visual exposition, try the [PPTX] or [PPT] presentation. (These were extracted from my thesis and job talk.)
Go ahead and read the lecture notes in [PDF]. Or, if you prefer a more visual exposition, try the [PPTX] or [PPT] presentation. (These were extracted from my thesis and job talk.)
If you are teaching a data structures course, you should consider teaching this. (It's already done at MIT by Erik and UIUC by Jeff.) I feel it's quite improper to teach data structures without some lower bounds; this is akin to teaching algorithms without NP-completeness. Now, if you're going to teach a lower bound, this is probably the easiest you'll ever get (certainly teachable to undergrads), and it does prove something very interesting: that binary trees are optimal for aggregation-type problems.
Now, for a bit of amusing history. This result was the very first paper I ever wrote, back in SODA'04. In the fall of my freshman year, I asked a friend if there were any cool theory problems left to solve, and he suggested P vs NP as quite interesting. I googled up some useful definitions, and worked on it for several months -- unfortunately without much success :)
In the second semester, I convinced Erik to pay me to do theory research -- this is called exploitation of a confused young faculty by a freshman. Expecting that I should publish something to retain my job, I decided to work on simpler lower bound questions, which (amazingly!) were still said to be open on some internet pages. In particular, my google searches had revealed Miltersen's survey on cell-probe complexity, which said that an Ω(lg n) bound was a big challenge.
Arrogant as I am, I didn't let such things intimidate me, and I proved the bound. Of course, I hadn't heard of such things as entropy at the time, but I had learned about Kolmogorov complexity from Sipser's book, which I was reading to develop background on P vs NP. The concept was obvious: you simply count strings of length n and n-O(1), and conclude that there exist incompressible strings. Thus, my proof was in terms of incompressible strings. (A referee comment later suggested that the authors should learn the useful concept of entropy, so I read up on Wikipedia and changed the terminology in the paper.)
I then came to Erik to explain the proof (which didn't go well at all, since I was essentially just standing in front of a blackboard and saying "It's obvious!"), and to ask about writing a paper. He explained that there are these theory conferences "STOC" and "FOCS" and one on algorithms/theory with a more practical focus, called "SODA." He did not elaborate on the relative rankings of these, but he didn't have to, since the situation was obvious.
I decided to be bold and submit to "SODA." My paper was unfortunately all about theory, but it was about an important practical problem, and I had a very good lower bound for it, so maybe it would make the cut even in the top conference, which cared about practically-important research. If it was rejected, I would have to resign and just publish it along with all the purely-theoretical crap that probably fills these "STOC" and "FOCS" conferences.
The rest is history. I went to Romania during the summer, and had to finish the paper over my parents' 56K modem connection. It got accepted. At the conference, some people said "amazing" and others had no clue what an augmented binary tree was. And, for some reason, 6.5 years later, I am still doing theory...