Go ahead and read the lecture notes in [PDF]. Or, if you prefer a more visual exposition, try the [PPTX] or [PPT] presentation. (These were extracted from my thesis and job talk.)
If you are teaching a data structures course, you should consider teaching this. (It's already done at MIT by Erik and UIUC by Jeff.) I feel it's quite improper to teach data structures without some lower bounds; this is akin to teaching algorithms without NP-completeness. Now, if you're going to teach a lower bound, this is probably the easiest you'll ever get (certainly teachable to undergrads), and it does prove something very interesting: that binary trees are optimal for aggregation-type problems.
Now, for a bit of amusing history. This result was the very first paper I ever wrote, back in SODA'04. In the fall of my freshman year, I asked a friend if there were any cool theory problems left to solve, and he suggested P vs NP as quite interesting. I googled up some useful definitions, and worked on it for several months -- unfortunately without much success :)
In the second semester, I convinced Erik to pay me to do theory research -- this is called exploitation of a confused young faculty by a freshman. Expecting that I should publish something to retain my job, I decided to work on simpler lower bound questions, which (amazingly!) were still said to be open on some internet pages. In particular, my google searches had revealed Miltersen's survey on cell-probe complexity, which said that an Ω(lg n) bound was a big challenge.
Arrogant as I am, I didn't let such things intimidate me, and I proved the bound. Of course, I hadn't heard of such things as entropy at the time, but I had learned about Kolmogorov complexity from Sipser's book, which I was reading to develop background on P vs NP. The concept was obvious: you simply count strings of length n and n-O(1), and conclude that there exist incompressible strings. Thus, my proof was in terms of incompressible strings. (A referee comment later suggested that the authors should learn the useful concept of entropy, so I read up on Wikipedia and changed the terminology in the paper.)
I then came to Erik to explain the proof (which didn't go well at all, since I was essentially just standing in front of a blackboard and saying "It's obvious!"), and to ask about writing a paper. He explained that there are these theory conferences "STOC" and "FOCS" and one on algorithms/theory with a more practical focus, called "SODA." He did not elaborate on the relative rankings of these, but he didn't have to, since the situation was obvious.
I decided to be bold and submit to "SODA." My paper was unfortunately all about theory, but it was about an important practical problem, and I had a very good lower bound for it, so maybe it would make the cut even in the top conference, which cared about practically-important research. If it was rejected, I would have to resign and just publish it along with all the purely-theoretical crap that probably fills these "STOC" and "FOCS" conferences.
The rest is history. I went to Romania during the summer, and had to finish the paper over my parents' 56K modem connection. It got accepted. At the conference, some people said "amazing" and others had no clue what an augmented binary tree was. And, for some reason, 6.5 years later, I am still doing theory...
Very nice post Mihai !
ReplyDeleteThanks for taking the time to write the proof down. I've wanted to look at your result for a while now but didn't have the time to do so.
ReplyDeleteNow at 5 pages I think it'll be manageable, even though it's not exactly a light read.
What was up with the previous comment (in Romanian)? Was the dude for real?
Actually, I'm hoping it's an easy read... Let me know if anything is unclear.
ReplyDeleteThe Romanian comment was for real, in the sense that several Romanian papers carried articles saying that I'm coming back to Romania to open a new Microsoft group there. It's also not for real, in the sense that I'm doing so such thing, and in fact Microsoft never talked to me about such an idea.
Wonderful note. This is the first application of entropy that I've read. Cool stuff, actually. Easy, understandable and informative.
ReplyDeleteThis comment is unrelated to the post, but I thought I would share this. Today one CS grad student tells me that he has never even heard of Erik Demaine (this guy works in theory by the way).
ReplyDeleteBut, none the less, he is a HUGE FAN of Mihai Pătraşcu. He says Mihai is one of the best things in CS. Credit to this blog!
Cheers!
Mihai, why is Erik your co-author in this paper? Wasn't it you who proved the result? What was Erik's contribution?
ReplyDeleteNo contribution to this particular paper, except indirectly through funding etc...
ReplyDeleteIf funding is a reason for authorship, then everybody at NSF should be your co-authors!
ReplyDeleteBTW, why do you require your coding to be prefix-free? I thought that inequality H(X) <= EC(X) holds for every uniquely decodeable C.
ReplyDeleteYou can encode n bits with roughly n-1 expected length if you don't require prefix freeness. You assign two possibilities to {0,1}; then four possibilities to {0,1}^2; etc. You are only left with 2 strings to assign an n-bit encoding.
ReplyDeleteBut your code is not uniquely decodeable in the sense that 00 is a code word for the two strings: one has the length n, and other - 2n.
ReplyDeleteThe code in your paper is uniquely decodeable in this sense. So you can sharpen your bound as follows:
2w * E[IT(v)] >= H[A].
Forgot my comment, I realized, when I was wrong.
ReplyDeleteDear Mihai,
ReplyDeleteThis is offtopic to this post, but I was wondering if you could at some point express your views about what you currently find to be interesting questions in the part of theoretical cs that is inspired by the "real" world. For example, some people I know think that research on algorithms for multi-threaded processes is horribly neglected, resulting in people in the industry having to cook up their own solutions, to what are already pressing questions.
Anonymous at 2:10, I think it might be funny for you to hear what Mihai thinks about research on multicore systems.
ReplyDeleteanonymous 1:31:
ReplyDelete(a) multicore was an example
(b) funny? Mihai may think it's stupid to do research on multicore systems (he thinks many things are stupid it seems), but since he is a smart and knowledgeable person, perhaps I expect to learn from what he thinks, whatever it may be.
I don't have anything intelligent to say about multithreaded computation at the moment. At ATT, I have not (yet?) seen any good application, nor have I seen a good model to work in. The same is true w.r.t. my time at IBM and my frequent interactions with the Google folk.
ReplyDeleteIf you are in a place where you can see real, compelling examples, you should try to bring some of them to light and (if possible) define a model that captures the interesting aspects of such computation.
What I am against is "multicore is here, let's start doing PRAM again." We kind of understand why PRAM failed (communication cost dwarfs everything), so our new thinking should avoid these problems. And, if by some miracle the new architecture can really implement PRAM, there is essentially nothing left to do :)
Dear Mihai, is it possible to learn a (nearly) perfect hashing function by Machine Learning?
ReplyDeleteSorry, anon, I do not really understand the question...
ReplyDeleteI think your presentation would be slightly simplified if you used the bit-reversal permutation to order the adversary's operations instead of a random permutation. That way (using a binary tree on power-of-two subsequences) the indexes from the left and right children of each tree node would always interleave perfectly.
ReplyDeleteIt should also get a slightly better constant factor, but that's less important because the constant factor is still some ways off from the upper bounds.
I thought of that, but eventually convinced myself it's only simpler if you already know bit-reversal permutations :) I don't really know.
ReplyDelete