Tuesday, December 22, 2009

Blog happenings

As you may have noticed, this blog has been on a bit of a hiatus. On the one hand, my expectations for politics-by-blog have decreased to a minimum and my boredom with the theory community has grown. On the other hand, I have not had enough time for exposition-by-blog --- or, better said, I discovered that my thought process is too obsessive and erratic to leave room for regular expository interruptions.


In an attempt to pull myself from the blog hiatus, let me respond to two recent blog posts, one guest post by Mikkel on My Biased Coin, and one post by Lipton.

My own view of Mikkel's post reads like: "Hey, remember that big-big fuss from 1-2 years ago about simplicity as a virtue? How come this was hijacked by people wanting to introduce new models all the time, and didn't help the papers on real algorithmic problems, where simplicity matters the most? Do we need to start our own «Innovations in Real Computer Science» conference, or can we have a session at SODA instead? "

My own favorite example of a book algorithm is the "mod-3" suffix tree construction of [Kärkkäinen, Sanders ICALP'03]. This is an amazingly simple algorithm (much simpler than anything before), and truly feels like the "ultimate" algorithm for suffix tree construction. It also demonstrates a rare property: dividing into 3 parts in a recursive construction can work much better than dividing into 2. This algorithm should (and probably will) be in any Intro to Algorithms course.

So, was this algorithm the Best Paper in STOC/FOCS? Not nearly --- in fact most STOC/FOCS people probably never heard of it. I also know from inside sources that the paper was viewed as somewhat borderline at ICALP (which is not too surprising --- as screwed up as US conferences are, we still don't compare to the European ones).

***

Moving on to Lipton's blog, we notice that most discussion is confused by the difference between running time and some (problem-dependent) optimization function.

Some people claim that they are different because of Moore's law (your slower algorithm will be fast enough soon), or the fact that CPU cost is not as real as, say, the cost of shipping goods around by trucks in your optimization problem. I disagree with both of these views.

It's irrelevant if Moore's law will make your algorithm fast enough in 2 years; by then, the competition may make you bankrupt. And if you really believe in thinking long term, you should probably be working on quantum computers and worrying about the speed of light (we all know that a memory of N bits has diameter at least N1/3, so memory accesses cannot be "constant time"). Sure, you may not say anything relevant for 50+ years, but then you will finally be avenged --- and you may be as influential as Babbage's computer was in the design of Java. My response to this long run thinking is a well-known quote from economics: "People don't eat in the long run." (Harry Hopkins)

How "real" CPU cost is will vary. If you're compressing Google's logs by 10%, you're likely to save 10% of a pretty big cost, a worthy goal. In time-constrained systems (like ATT routers), this cost is not smooth, but exhibits a phase transition. If you make this algorithm 30% faster, we can provide this extra functionality that competitors don't have, and convince some companies to switch to our network. If not, not. Adding 30% more computing power is essentially not an option.

So, no, the difference between running times and other optimization criteria is not that it is less of a cost. The difference is that we understand running times a whole lot better. Look, we spent decades understanding the usefulness and the shortcomings of asymptotic analysis, refining our computer model (Word RAM + external memory issues), etc. When we talk about running time, we usually know what we're talking about.

On the other hand, every combinatorial optimization problem is a modelling challenge. Constraints are not quite like this, the graph is not quite general, this cost function is not quite right etc etc. Each important problem needs (and deserves) considerable modelling work before our theoretical predictions are as good as we may want.

But what do we do at STOC/FOCS? We model the problem in the simplest way, ignoring applications (which we are largely unaware of). Then, we prove Θ(lg n) approximation bounds (complete with PCP lower bounds) for the dollar cost of shipping goods by trucks (a problem now rebranded with a fancy name). Now just wait for those Amazon guys to come asking about how to improve their 5% approximation, and we can tell them the problem is perfectly understood!

Basically, a lower bound greater than 2 on a dollar cost is a formal proof that your formalization of the problem is wrong.

So, should we get our hands really dirty and look at the problems in such detail? I think if you're going to be passionate about approximation algorithms, you should do it at least a few times. (I, for one, did a lot of performance-oriented programming before my current work on theory of running times...)

I hope that if our approximation algorithms community gets a lot closer to the OR community, good things will happen. On the one hand, they might give us a break with their "understanding the nature of computation" claims, when they realize they have only a vague idea of what computation really is. On the other hand, nice theory might yet be developed which will talk about 3% improvements...

14 comments:

Luca said...

Actually the diameter of an N-bit memory device must be at least N^{1/2}, because O(R^2) is an upper bound to the number of heat-producing things you can pack in a ball of radius R and still be able to dissipate the heat. Also, O(R^2) is an upper bound to the mass you can pack in a ball of radius R without it becoming a black hole.

For best black-hole-avoidance and heat-dissipation purposes, you should thinly spread your N memory registers on the surface of a ball of radius Theta(N^{1/2})

Anonymous said...

I was going to bring up the holographic principle, but Luca beat me to it.

Your dismissal of the long-run view isn't convincing at all. Winsome little quotes are for small minds.

I agree with your arguments on practical approximation algorithms, though.

Also I love how you always take every opportunity to offend everyone, with your ICALP comment in this case.

Anonymous said...

Dear Mihai,
I am your big fan.
You speak the truth, even though you know people would not like it at all.
I do hope that once you would become a senior professor, you would clean the dirty world of academia.
Love and Respect,
Your fan,
V

Anonymous said...

Mihai is drunk on anti-approximation algorithms kool-aid and seems blinded. He seems to forget that plenty of approximation people work at research labs and in the industry and adapt their knowledge as necessary, just as other algorithms researchers do. Why does simplex algorithm work so well in practice? Difficult to answer because worst case instances don't seem to arise. Same thing with approximation. It is not that modeling is wrong when one proves that one cannot get a better than 2-approx for a problem. It is simply that the worst case instances do not always arise in practice. The same issue is there in running time analysis as well. And people solve problems to within 5% efficiency in practice not always by changing the model - either they have easy instances or they throw sufficient resources at it. TCS has been highly successful in seeking new models, providing a lens to view computation, and also occasionally venturing into making practical contributions. Sure, we can all use more practice oriented work and people. It is not clear that arrogance, holier-than-thou attitude, and a clear lack of experience in stuff that one is criticizing, is the best way.

Anonymous said...

Why is there so much obsession with FOCS/STOC/SODA? Can't we have an online publication medium for great textbook algorithms?

rgrig said...

I also know from inside sources that the paper was viewed as somewhat borderline at ICALP.

Reminds me of an article about reviews from four years ago. :)

Mihai said...

He seems to forget that plenty of approximation people work at research labs and in the industry and adapt their knowledge as necessary

Oh darn, this is what all those ATT people have been doing!

Why does simplex algorithm work so well in practice? Difficult to answer because worst case instances don't seem to arise.

Simplex works well in many cases, and we do have explanations for it. But FYI the world of LP-solving algorithms runs much deeper than simplex...

It is simply that the worst case instances do not always arise in practice.

This hard instance/easy instance thing is a cheap bail-out. It allows you to dismiss the reality (oh, those instances are different from ours), and continue to do whatever you were doing.

rgrig said...

For best black-hole-avoidance and heat-dissipation purposes, you should thinly spread your N memory registers on the surface of a ball of radius Theta(N^{1/2})

So if you put the processor in the center the access time depends only on the amount of memory, but not on the address. And if you want a hierarchy of memory you use spheres with radii 1, 2, 4, 8, ... Nice.

Anthony Leverrier said...

Reminds me of an article about reviews from four years ago. :)

Strange how the first and last reviewers both use the same expression "standard and we must live with it".
Is it a coincidence?

rgrig said...

@tonio: The reviews are likely made up.

Anonymous said...

I wonder why nobody noticed mod-3 suffix tree typo. (contrast this to mod-3 suffix array) Maybe nobody has heard of it?

Mihai said...

Actually, suffix trees and suffix array are essentially the same thing, since you can convert between them in O(n) time. In practice, you want some post-processing anyway (perhaps arranging suffix arrays in a B-tree, and clustering suffix trees in a more cache-friendly way).

Anonymous said...

These O(n)'s are really suspect when talking about suffix trees/arrays. You can create a suffix array or tree in O(n) time in the same sense that you can sort n values in O(n) time. This isn't normally a claim one makes about sorting without some sort of explanation.

Mihai said...

If the alphabet is a most poly(n), one can sort in linear time by radix sort. In reality, it is 8 or 16 bits, so this is certainly the truth.

If the alphabet is larger, just hash it down. For suffix trees, this makes no difference (the order of the children of a node is not defined); for suffix arrays, this reorders the alphabet, which doesn't make much difference in most applications. If you really want the proper sorting, you have to add some sort(Sigma) term, which I have always treated at pedantry. Sorting is well studied in other contexts, and not the nature of this problem.