Tuesday, December 22, 2009

Blog happenings

As you may have noticed, this blog has been on a bit of a hiatus. On the one hand, my expectations for politics-by-blog have decreased to a minimum and my boredom with the theory community has grown. On the other hand, I have not had enough time for exposition-by-blog --- or, better said, I discovered that my thought process is too obsessive and erratic to leave room for regular expository interruptions.


In an attempt to pull myself from the blog hiatus, let me respond to two recent blog posts, one guest post by Mikkel on My Biased Coin, and one post by Lipton.

My own view of Mikkel's post reads like: "Hey, remember that big-big fuss from 1-2 years ago about simplicity as a virtue? How come this was hijacked by people wanting to introduce new models all the time, and didn't help the papers on real algorithmic problems, where simplicity matters the most? Do we need to start our own «Innovations in Real Computer Science» conference, or can we have a session at SODA instead? "

My own favorite example of a book algorithm is the "mod-3" suffix tree construction of [Kärkkäinen, Sanders ICALP'03]. This is an amazingly simple algorithm (much simpler than anything before), and truly feels like the "ultimate" algorithm for suffix tree construction. It also demonstrates a rare property: dividing into 3 parts in a recursive construction can work much better than dividing into 2. This algorithm should (and probably will) be in any Intro to Algorithms course.

So, was this algorithm the Best Paper in STOC/FOCS? Not nearly --- in fact most STOC/FOCS people probably never heard of it. I also know from inside sources that the paper was viewed as somewhat borderline at ICALP (which is not too surprising --- as screwed up as US conferences are, we still don't compare to the European ones).

***

Moving on to Lipton's blog, we notice that most discussion is confused by the difference between running time and some (problem-dependent) optimization function.

Some people claim that they are different because of Moore's law (your slower algorithm will be fast enough soon), or the fact that CPU cost is not as real as, say, the cost of shipping goods around by trucks in your optimization problem. I disagree with both of these views.

It's irrelevant if Moore's law will make your algorithm fast enough in 2 years; by then, the competition may make you bankrupt. And if you really believe in thinking long term, you should probably be working on quantum computers and worrying about the speed of light (we all know that a memory of N bits has diameter at least N1/3, so memory accesses cannot be "constant time"). Sure, you may not say anything relevant for 50+ years, but then you will finally be avenged --- and you may be as influential as Babbage's computer was in the design of Java. My response to this long run thinking is a well-known quote from economics: "People don't eat in the long run." (Harry Hopkins)

How "real" CPU cost is will vary. If you're compressing Google's logs by 10%, you're likely to save 10% of a pretty big cost, a worthy goal. In time-constrained systems (like ATT routers), this cost is not smooth, but exhibits a phase transition. If you make this algorithm 30% faster, we can provide this extra functionality that competitors don't have, and convince some companies to switch to our network. If not, not. Adding 30% more computing power is essentially not an option.

So, no, the difference between running times and other optimization criteria is not that it is less of a cost. The difference is that we understand running times a whole lot better. Look, we spent decades understanding the usefulness and the shortcomings of asymptotic analysis, refining our computer model (Word RAM + external memory issues), etc. When we talk about running time, we usually know what we're talking about.

On the other hand, every combinatorial optimization problem is a modelling challenge. Constraints are not quite like this, the graph is not quite general, this cost function is not quite right etc etc. Each important problem needs (and deserves) considerable modelling work before our theoretical predictions are as good as we may want.

But what do we do at STOC/FOCS? We model the problem in the simplest way, ignoring applications (which we are largely unaware of). Then, we prove Θ(lg n) approximation bounds (complete with PCP lower bounds) for the dollar cost of shipping goods by trucks (a problem now rebranded with a fancy name). Now just wait for those Amazon guys to come asking about how to improve their 5% approximation, and we can tell them the problem is perfectly understood!

Basically, a lower bound greater than 2 on a dollar cost is a formal proof that your formalization of the problem is wrong.

So, should we get our hands really dirty and look at the problems in such detail? I think if you're going to be passionate about approximation algorithms, you should do it at least a few times. (I, for one, did a lot of performance-oriented programming before my current work on theory of running times...)

I hope that if our approximation algorithms community gets a lot closer to the OR community, good things will happen. On the one hand, they might give us a break with their "understanding the nature of computation" claims, when they realize they have only a vague idea of what computation really is. On the other hand, nice theory might yet be developed which will talk about 3% improvements...

Friday, December 4, 2009

Talks

Update: The 2nd talk is happening 6pm-8pm in Amfiteatrul Pompeiu (= amfiteatrul de la etajul 2).

I am giving two talks in Bucharest this coming week:
  • a more practical one at Poly on Monday (December 7) at 6pm in room EC 101 (Fac. Automatică şi Calculatoare)
  • a more theoretical one at UniBuc on Friday (December 11). Room and time to be announced here, but it will be earlier in the afternoon.
Abstracts below. Don't forget to vote on Sunday.

  1. Funcţii de hash tabulare

    Implementarea tabelelor de hash folosind căutare liniară (linear probing) este mai eficientă decât alte implementări dacă funcţia de hash este suficient de alteatoare. Din păcate, linear probing se comportă foarte prost în conjuncţie cu funcţiile de hash bazate pe înmulţire (cele mai folosite în practică), şi ca atare, acest algoritm este deseori evitat.

    Voi descrie o funcţie de hash foarte simplă, care este la fel de rapidă ca înmulţirea pe procesoarele actuale, dar se bazează pe indexarea în tabele precalculate. O analiză matematică ne demonstrează că această funcţie garantează timp de rulare constant pentru tabelele de hash.

  2. Rezultate negative pentru structuri de date

    Cum demonstrăm că anumite rezultate algoritmice sunt imposibil de obţinut? Spre exemplu, cum demonstrăm că nu există nicio structură de date cu spațiu liniar care poate suporta range queries în timp constant? În acest curs, voi descrie o demonstrație completă a acestui rezultat, trecând prin mai mulți pași simpli, dar interesanți.

Tuesday, November 24, 2009

FOCS 2010

The FOCS 2010 website is already up. This promises to be a very interesting conference.

Tuesday, November 10, 2009

A Simple Encoding Proof

In this post, I discuss a nice and simple example of an encoding proof, showing that maintaining partial sums require Ω(lg n) time per operation.


Go ahead and read the lecture notes in [PDF]. Or, if you prefer a more visual exposition, try the [PPTX] or [PPT] presentation. (These were extracted from my thesis and job talk.)

If you are teaching a data structures course, you should consider teaching this. (It's already done at MIT by Erik and UIUC by Jeff.) I feel it's quite improper to teach data structures without some lower bounds; this is akin to teaching algorithms without NP-completeness. Now, if you're going to teach a lower bound, this is probably the easiest you'll ever get (certainly teachable to undergrads), and it does prove something very interesting: that binary trees are optimal for aggregation-type problems.

Now, for a bit of amusing history. This result was the very first paper I ever wrote, back in SODA'04. In the fall of my freshman year, I asked a friend if there were any cool theory problems left to solve, and he suggested P vs NP as quite interesting. I googled up some useful definitions, and worked on it for several months -- unfortunately without much success :)

In the second semester, I convinced Erik to pay me to do theory research -- this is called exploitation of a confused young faculty by a freshman. Expecting that I should publish something to retain my job, I decided to work on simpler lower bound questions, which (amazingly!) were still said to be open on some internet pages. In particular, my google searches had revealed Miltersen's survey on cell-probe complexity, which said that an Ω(lg n) bound was a big challenge.

Arrogant as I am, I didn't let such things intimidate me, and I proved the bound. Of course, I hadn't heard of such things as entropy at the time, but I had learned about Kolmogorov complexity from Sipser's book, which I was reading to develop background on P vs NP. The concept was obvious: you simply count strings of length n and n-O(1), and conclude that there exist incompressible strings. Thus, my proof was in terms of incompressible strings. (A referee comment later suggested that the authors should learn the useful concept of entropy, so I read up on Wikipedia and changed the terminology in the paper.)

I then came to Erik to explain the proof (which didn't go well at all, since I was essentially just standing in front of a blackboard and saying "It's obvious!"), and to ask about writing a paper. He explained that there are these theory conferences "STOC" and "FOCS" and one on algorithms/theory with a more practical focus, called "SODA." He did not elaborate on the relative rankings of these, but he didn't have to, since the situation was obvious.

I decided to be bold and submit to "SODA." My paper was unfortunately all about theory, but it was about an important practical problem, and I had a very good lower bound for it, so maybe it would make the cut even in the top conference, which cared about practically-important research. If it was rejected, I would have to resign and just publish it along with all the purely-theoretical crap that probably fills these "STOC" and "FOCS" conferences.

The rest is history. I went to Romania during the summer, and had to finish the paper over my parents' 56K modem connection. It got accepted. At the conference, some people said "amazing" and others had no clue what an augmented binary tree was. And, for some reason, 6.5 years later, I am still doing theory...

Friday, October 23, 2009

Encoding Proofs

Various fields have various notions of "nice proofs," be they combinatorial, or elementary, or bijective. In TCS, perhaps the correct standard for lower bound proofs should be "encoding proofs." In these proofs, one starts with the assumption that some algorithm exists, and derives from that some impossible encoding algorithm, e.g. one that can always compress n bits into n-1 bits.


A normal lower bound will have a lot of big-bad-ugly statements -- "there are at least A bad sets (cf Definition 12), each containing at least B elements, of which at most a fraction of C are ugly (cf Definition 6)". To deal with such things, one invokes concentrations left and right, and keeps throwing away rows, columns, elements, and any hope that the reader will not get lost in the details.

There are 3 huge problems with this:
  1. Most lower bounds cannot be taught in a regular class. But we can't go on saying how problems like P-vs-NP are so awesome, and keep training all our graduate students to round LPs better and squeeze randomness from stone.

  2. The reader will often not understand and appreciate the simple and beautiful idea, as it is too hard to pull apart from its technical realization. Many people in TCS seem to think lower bounds are some form of dark magic, which involves years of experience and technical development. There is certainly lots of dark magic in the step where you find small-but-cool tricks that are the cornerstone of the lower bound; the rest can be done by anybody.

  3. You start having lower-bounds researchers who are so passionate about the technical details that they actually think that's what was important! I often say "these two ideas are identical" only to get a blank stare. A lower bound idea never talks about entropy or rectangle width; such things are synonymous in the world of ideas.
Proofs that are merely an algorithm to compress n bits have elegant linearity properties (entropy is an expectation, therefore linear), and you never need any ugly concentration. Anybody, starting with a mathematically-mature high school student, could follow them with some effort, and teaching them is feasible. Among researchers, such proofs are games of wits and creativity, not games involving heavy guns that one may or may not have in their toolkit.

***

My paper on lower bounds for succinct rank/select data structures was submitted to SODA in extraordinary circumstances. I had been travelling constantly for a month, and the week before the deadline I was packing to move out of California and down with a flu. In the end, the only time I found for the paper was on an airplane ride to Romania, but of course I had no laptop since I had just quit IBM. So I ended up handwriting the paper on my notepad, and quickly typing it in on my sister's laptop just in time for the deadline.

[ You would be right to wonder why anybody would go through such an ordeal. I hate submitting half-baked papers, and anyway I wanted to send the paper to STOC. But unfortunately I was literally forced to do this due to some seriously misguided (I apologize for the hypocritical choice of epithets) behavior by my now-coauthor on that paper. ]

If you have 8 hours for a paper, you use all the guns you have, and make it work. But after the paper got in, I was haunted by a feeling that a simple encoding proof should exist. I've learnt long ago not to resist my obsessions, so I ended up spending 3 weeks on the paper -- several dozen times more than before submission :). I am happy to report that I found a nice encoding proof, just "in time" for the SODA camera-ready deadline. (As you know, in time for a camera-ready deadline means 2 weeks and 1 final warning later.)

The paper is here, if you are interested.

Thursday, October 8, 2009

Nobels

Since we've been talking about prizes, let me mention the recently announced Nobel awards for 2009.


In Physics, half the prize goes to two retired researchers from the old Bell Labs, Willard Boyle and George Smith. According to Wikipedia, this is the 7th time the Nobel prize is won for work done at Bell Labs.

Of course, the Lab is not what it used to be. After the famous AT&T/Lucent split of 1996, AT&T Bell Labs split into Lucent's Bell Labs (currently in a somewhat precarious state), and AT&T Labs, Inc. AT&T Labs experienced massive corporate pain at one point (famously firing an entire machine learning group in one shot), but currently appears to be in good shape (for instance, two machine learning researchers from AT&T were in the team that won the recent Netflix Challenge).

Of course, there is no active Physics research going on today, so the Nobel is only about past glory. But could the Labs win a Turing award for current work? Possibly, although it's certainly not a top contender. At least some baby steps are being taken to turn this place back into a real research lab: Howard Karloff recently bought a ping-pong table.

***

In other news, Herta Müller won the Nobel in literature. She is a Romanian-born Banat Schwab (a German ethnic group in Romania and Serbia). She lived the first 34 years in Romania, and later emigrated to Germany in 1987. Her work focuses on the harsh life under Romanian communism, and so she was not allowed to publish freely before leaving Romania.

In Romania, her work is essentially unknown -- as you might imagine, we have enough Romanian-language authors writing on the same topic, some of which are unquestionably very good.

Friday, October 2, 2009

Follow-up

My previous blog post generated a record number of comments (74 as I write this). Of course, this is not necessarily something to write home about, but I am proud that we might have passed the threshold of 5 intelligent comments to a post.

I pulled away from it due to some travel (which was a good idea anyway), so let me get back with some follow-up observations.

1. Michael Mitzenmacher discussed the post, and the comments there are quite interesting (or depressing, depending on your point of view). I have always said that TCS is headed for bad trouble given how we educate the current generation of students. We will end up with a batch of researchers who have never heard of 90% of the problems at the intersection of TCS and the rest of CS, who successfully turn our discipline into philosophy while day dreaming about turning it into pure mathematics.

As you may see among the comments, there are people who have actually never heard of those problems, yet are fully confident in their comprehensive understanding. When faced with commentators who actually like the problems, there are only two logical conclusions open to them: these guys are either insane, or Mihai himself (two possibilities that are not mutually exclusive, of course).

Now, one cannot take conclusions based on blog discussions too seriously. But I will volunteer anecdotal evidence from a talk at Weizmann: when I asked who knew what a "Voronoi diagram" was, a few sporadic hands came up. (Go try this at home!) The problem: Weizmann is an excellent school, educating many of our future colleagues.


2. Some people asked whether I would have gone to UCSD, had they made me an offer first. This is impossible to tell. I was certainly considering it wholeheartedly, but I didn't even have the answers from all places at the time, so I cannot know how my opinion would have evolved.

Many commentators seem to have missed that the post was about psychology, not logic. I did not explain how (1) UCSD made an offer to Costis implied (2) I rejected UCSD; I explained my perception of (1). Indeed, you are missing some facts that contributed to "(1) implies (2)," at least some of which are not appropriate for blogs --- even by my unorthodox standard.


3. Another thing that I did not comment on (except inside some people's minds) was the work of Costis. If you want to hear his work being put down, you really don't need me; tune in to the appropriate non-blog channels and I can guarantee you won't be disappointed. (This is to be expected for any person who achieved some degree of notoriety at early age, and perhaps got more hype than he had bargained for.)

In fact, my opinion is quite the opposite: I find Costis to be a very deep thinker, both technically and meta-technically. We repeatedly went for beers and I can report no significant disagreements were found, in spite of liberal comments during said encounters :). For example, due to my algorithmic sensibilities, it is quite clear that I consider the complexity of computing Nash to be very important (it is a concrete, central, and structurally-interesting algorithmic question).


4. Many negative comments were knee-jerk reactions, fully reflecting people's frustrations and insecurities. In the US culture, it would be customary to apologize for insulting people's sensibilities (I never figured out whether such apologies are interpreted as sincere, or they are merely correct protocol). Of course, things are rather different in my own culture, and it has never been my priority to tread lightly. So let me offer the typical Romanian advice in such circumstances: "Don't worry; life may be hard, but it passes quickly."


5. Some more commentators found it hard to accept my comments given "my position" (whatever they interpreted my position to be). The most constructive of them told me to look at Hamming's well-known speech so that I may learn "how it is possible to get the considerable benefits of egotism without needlessly pissing people off."

I find this particularly funny, since I once learned the position of a great information theorist, a contemporary of Hamming. It was roughly "Eh, Hamming... He's just arrogant, he never had any really good results."

This reminds me of something that a friend told me a long time ago: "the most crucial ingredient in the greatest paintings is the light bulb in the room; even the work the best painters is quite dull in a dark room."

If you find that your point of view doesn't allow you to see what is being said, perhaps you can try another one temporarily.


5. Finally, let me respond to somebody who seems to write politely and in good faith:
I was on a hiring committee in one of these schools that decided not to interview you. Although I hesitated to post this comment, I think what I have to say will be helpful to your career.
Thank you for the comment; I appreciate the difficulty in writing on such a topic.
The reason we decided against further considering your case was because of your reputation as a very difficult, arrogant, and opinionated person. We even read your blog and found many posts that confirmed this reputation.
I am aware that suggestions of this form were made. Of course, I may point out the significant pitfalls in searching for evidence (in a blog, of all places) for something that you are already biased to believe. I may also point out that the places that actually know me (the labs were I did internships) both made me offers, and a notable fraction of the universities where I interviewed were also interested in hiring me. So the suggestions may be a bit exaggerated, or perhaps there is a high variance in how people perceive me. If for some reason your university finds itself interested in my case, I would consider a conversation at a conference or an invitation for a talk as more reliable courses of action.
At least in our university, a post like this one significantly hurts your chances of getting a job. We don't want to hire a person who writes a blog post insulting their departmental colleagues every time they're not nominated for a fellowship or an award. "I can't believe my department decided to nominate X for a Sloan Fellowship when my research is so much deeper than X's."
If your department has no faculty engaging in the common activity of "bitching and moaning," I am not sure whether I should envy you for your luck, or worry about the hyper-formal environment that you may have in place. It is safe to say that I am not good at keeping rank formation; but it is also fair to say that I choose the battles where I pull out of the formation very selectively.
You're smart, but there are *many* (equally) smart (or smarter) people who are also publicly nice.
It is my hope that I will prove you wrong on this statement.

Thursday, September 10, 2009

Labs vs Academia

The topic of the day seems to be labs vs academia (sparked by this, and indirectly by Muthu, picked up by Michael Mitzenmacher and Jon Katz). I thought I would contribute my own perspective, since I've been in labs for about a year, and went through university/lab and lab/lab transitions recently.


Location. For those of us with families, location is one of the main factors. It is not easy to get a job in a top lab, but it is much easier than in a top university, simply because a lot more people go through the lab system (see below). Consider the three prime markets: NY, Bay Area, and Boston.
  • In the Bay Area, Berkeley / Stanford weren't interested in interviewing me, but IBM gave me jobs (both a temporary and a full-time one, which was great).
  • In the NY area, Princeton / NYU / Columbia weren't interested, but, again, ATT gave me a job.
  • In Boston, the problem remains unsolved since MSR New England is not in the mood to hire young people. But arguably there are many more non-top-but-good universities, so this compensates (around San Francisco, for instance, this alternative is entirely lacking).
For me, location made the decision between IBM and ATT (since California tries quite hard to keep foreign medical doctors away).

To see the difference between the chance of getting a job in labs vs universities, think about the numbers: IBM, ATT, and MSR Silicon Valley were willing to hire in theory (and they did) on my particular year, but among the top universities only MIT did.

As for the non-prime locations, there is a wide range between almost-prime and really-avoid. I had offers from some almost-prime ones (Atlanta and San Diego), and I think I would've been happy with them.

However, I want to add that I don't buy Jon Katz's argument that non-prime locations are cheaper. Think about what you want; the cost depends very much on your utility function. For me, things that matter include flight time (and cost) to Europe, availability of Romanian/Eastern European wine and food, and closeness to mountains (the worst thing about NY). Things that don't matter include the price of houses in suburbs.

Career prospects. Let me make this explicit: the chance that you will grow old in a lab is slim. Think of your lab career as a 5-year postdoc, and be prepared to reapply to faculty jobs.

Sure, labs will go through a great deal of effort to tell you that this is not true. But look around: the vast majority old people in labs are either the big founder-of-the-group types, or people who are post-faculty positions (they had one and left). The number of people who go through the lab system is significantly higher than the number of people who stay (which is great for young job seekers, as I mentioned above).

I am not sure what causes this empirical fact, but I can offer some hypotheses:
  • Tradition and culture encourage you to leave for universities at some point;
  • Labs reliably go through periods of instability, where things turn really bad, and most of the group leaves. Some labs never recover, while others recovered quite successfully, but with a significantly changed group (see IBM Almaden and ATT).
  • If you spend too many years in a lab, there is a danger that you fade into TCS oblivion. (You get integrated more into practical groups, and stop being a theorist; you lose contact with the recent trends; etc.)
  • The perspective of having students seems much more attractive once you get tired of the nitty gritty details and would like to delegate some.
Thus, my recommendation to job seekers is to consider labs as very high risk venture, compared to a job in a university. On the one hand, if you choose the lab option, you will most likely seek to move to a university later, so make sure that your track record stays very competitive. On the other hand, if you choose a university, CS departments are exceedingly friendly and rumour has it that tenure is easy to come by (even if your record as Assistant Professor is one notch lower than it was as a PhD student, when you were hired --- as someone put it, "people who get jobs at top places are brilliant, while people who get tenure at top places are very good").


Freedom for in-depth work. If you're reading the discussion above as saying that a university job is always better, you're missing the fact that "risk" is always two-sided. The one aspect where labs excel is the ability to do in-depth work.

As faculty, you have to worry about your classes, your students, and your grants. These will slowly push you to be more stable, reliable, and risk-avoiding (turning you from brilliant to very good). No matter how insensitive you are, chances are you will care about your students at least a little bit, and you will think twice before hanging their life paths on some super-risky venture. (Not to mention that you have to train them before they can join you on said super-risky venture.) And ignoring the students is also hard to do in practice; they drain a lot of mental energy from you.

In a lab, you have to worry about not getting fired, and nothing else. Theory-wise, you can take large risks if you care to (you're not affecting anybody else), and you don't owe anybody any significant share of your brain.


In-breadth work. As I showed above, labs are much better for in-depth, risky, theoretical work. How about if you fancy yourself doing non-theoretical work? The situation is exactly the opposite.

Labs are the ideal place to collaborate on standard problems with people across Computer Science. There are many people in the lab doing practical things. You get brownie points for working with them, and they are very interested in working with you (I think they also get brownie points). They don't have an empire of students that they can't really leave; they have time for you. Their problems make sense, i.e. you won't dismiss them as "practice" (as someone put it, in universities, most of "practice" doesn't apply to practice, which is a big turn-off for some theoreticians who try to stick their heads outside theory). You can question their model as much as you like, since lab people have the real data, and they are open to revising their understanding of the world if you see that data in a different light.

But if you want to go broader and riskier, you're out of luck. You simply cannot work in theoretical linguistics, both because there's nobody around you whom you can talk to, and because nobody in the management wants to hear about you doing this. And you cannot even work on anything too non-standard in computer science -- companies are surprisingly risk averse and slow. (You may remember the story of a theory guy who came up with the Google page-rank scheme around the same time; he was in an industrial lab, while the Google founders were at a university. Which of them turned this into a billion-dollar product?)

Thus, if you fancy yourself as revolutionary outside theory, go for a university (and start by getting tenure). If you want to solve really cool problems inside a bigger system, a lab is the place to be. In a university, you will simply never hear about those problems, and in a start-up you will not face them. (A start-up has to do one thing very well, and other things less than horribly; they don't have much variety for deep thinkers.)


Salary. If that matters a lot for you, I have bad news. Salaries in labs are really not what they're rumoured to be. They are, indeed, higher than in universities, but only marginally so. From the data points I have, this seems to hold true at any level (e.g. comparing middle age lab people with Associate Professors.)

In fact, be prepared to negotiate for any salary advantage. My initial offer was equal to the initial offer I got for a 9-months salary at a university! It got increased significantly after I laughed them off. (I heard a very good phrase that I recommend: "I certainly prefer to come to your place on intellectual grounds alone; but my wife would find it very strange that I am refusing this other job with such nice benefits." -- somehow HR people never argue with the stereotype of a wife interested in money :)


Intellectual honesty. In some sense, the lab offers you a great feeling of intellectual honesty: you have seen the real problems (even if you don't work on them). You have informed opinions on what is really important outside theory, and you may get to do such work yourself.

But the labs can also be awful on the same grounds of intellectual honesty. You have to ask for permission and IP-clearance before you publish anything. If you do anything that might be interpreted as important, there is a risk that you are forced to patent it (a horrible thing from my own ethical perspective, due to the way patents are abused and misused in the US). And if you want to write a library for the world to use, chances are it will be very hard to release it.


Personal freedom and appreciation of your work. People complain about boring faculty meetings, and other such annoyances in the life of a professors. Well, labs are worse. There are, for instance, the unbelievably ridiculous things, like hour-long mandatory courses on the effectiveness of sending polite emails to coworkers. As opposed to a professors, whose relation to the university is more like a member in a coop, in a lab there are people above you who have no respect for you or your time.

Then there are the moderately ridiculous things, like cutting coffee to save money. Again, it comes down to the fact the decisions are made by people who don't know you or have any respect for you.

Finally, there is the issue of travel. Faculty may complain (rightly) about the horrors of periodic begging for grants, but it is nowhere are demeaning as begging for every single trip you want to make (whether your lab should pay, or some external source is paying). I'm not saying travel approvals are harder to write -- they are much easier than grant proposals; but their periodic, and insignificant nature makes them demeaning. (This issue is especially bad during times of economic hardship. Muthu once mentioned a deal of the form "add $15K to my salary and I'll never file a reimbursement form." In certain times, the deal you get at labs is really like "we'll add $0 to your salary, and you never file a reimbursement form.")

Of course, objectively things are not bad. We have high enough salaries to afford coffee, and even all scientific travel when it comes to that (it's tax deductible, so it's half the dollar cost to you). It's all about the fact that somebody far above you sees no difference between your job and a dishwasher's.


There are probably many aspects of the lab vs. university comparison that I missed, so feel free to comment or ask.

Wednesday, September 9, 2009

SODA / Data Structures

The SODA list of accepted papers, with abstracts, is here.


Here are the clear data structures papers:
  • Fully-Functional Succinct Trees (Kunihiko Sadakane, Gonzalo Navarro)

  • On the Cell Probe Complexity of Dynamic Membership (Ke Yi, Qin Zhang)

  • Data Structures for Range Minimum Queries in Multidimensional Arrays (Hao Yuan, Mikhail J. Atallah)

  • A locality-sensitive hash for real vectors (Tyler Neylon)

  • Deletion Without Rebalancing in Balanced Binary Trees (Siddhartha Sen, Robert E. Tarjan)

  • Cache-Oblivious Dynamic Dictionaries with Optimal Update/Query Tradeoff (Gerth Stølting Brodal, Erik D. Demaine, Jeremy T. Fineman, John Iacono, Stefan Langerman, J. Ian Munro)

  • Cell-probe lower bounds for prefix sums and matching brackets (Emanuele Viola)
    + A Lower Bound for Succinct Rank Queries (Mihai Patrascu)

  • Applications of Forbidden 0-1 Matrices to Search Trees and Path Compression-Based Data Structures (Seth Pettie)

The following are also data structures papers, though they can be seen as stradling two areas:
  • A Nearly Optimal Algorithm for Approximating Replacement Paths and k Shortest Simple Paths in General Graphs (Aaron Bernstein) -- the main contribution is better preprocessing time of a data structure

  • Counting Inversions, Offline Orthogonal Range Counting, and Related Problems (Timothy M. Chan, Mihai Patrascu) -- improves the preprocessing and offline times of some data structures

  • Highway Dimension, Shortest Paths, and Provably Efficient Algorithms (Ittai Abraham, Amos Fiat, Andrew Goldberg, Renato Werneck) -- distance oracles are a very important topic in data structures; this paper is more about modelling, though.

  • On the Exact Space Complexity of Sketching and Streaming Small Norms (Daniel M. Kane, Jelani Nelson, David P. Woodruff) -- whenever you talk about streaming and care about time complexity, this is clearly a data structure; if you solve the problem with k-wise independent hash functions, even more so.

  • The 1+β Choice Process and Weighted Balls-into-Bins (Yuval Peres, Kunal Talwar, Udi Wieder) -- more on the power of two choices, though the process is probably unrealistic in a data structures context.

  • Regular Expression Matching with Multi-Strings (Philip Bille, Mikkel Thorup) -- stringology is close enough :)

  • Fast distance multiplication of unit-Monge matrices (Alexander Tiskin) -- also stringology.

I find the number of papers to be on the low side. There are a couple of rejected papers where I may disagree with the wisdom of the PC, but overall I think the numbers just reflect the fact that the field is old, and progress comes hard.

Disclaimer: Note that this list is not so closely correlated with my interests. There are data structure papers that I am not interested in, and many non-data-structures papers that I am interested in (even among my own papers, only half can be found above).

Thursday, September 3, 2009

What is efficient?

In my previous post, I wrote about how calling P "efficient computation" is nothing more than bad philosophy. I think a lot more examples are in order.


Are problems constant size? Opinions that P is efficient often stem from the following understanding of what computers are used for. Let's say that I have to optimize the bus schedules in NYC, and there are some 200 bus routes to consider. Certainly, an O(n3) algorithm is worse than an O(n lg n) algorithm, but there is a really large gap between both of these, and an O(n!) algorithm. Indeed, with an O(n3) running time I will have to buy more/bigger computers, and let them run for a week instead of 30 seconds. But with an O(n!) algorithm, throwing more resources at the problem is simply not possible (I would need a computer the size of Earth). In general, as machines grow in power, more and more polynomial algorithms become usable, whereas exponential algorithms remain similarly useless.

The big fallacy of this argument is assuming that we know what we're doing. We don't! The problem, and the size of the problem changes, often faster that the speed of our computers.

If you give me an O(n3) algorithm for the bus scheduling problem, all of a sudden I will realize that I can ask for more. Before, I was assuming that each bus route should have a fixed frequency, but certainly I can get better results (in real life) if I allow variable gaps between consecutive buses. All of a sudden, the number of variables in your problem will explode (e.g., there might be 30 cycles on the same bus route during the day). Why did it explode? Only because you told me that you had a faster algorithm.

Twenty years ago, nobody could envision problems of the scale that Google solves today. Now we can, and it's all a fault of the hardware guys. They managed to push storage capacity (disk and RAM) so much, that it makes sense to ask questions about billions of web pages.

For as long as Moore's law is (was) active, we can at best expect the memory size and the processor "power" to grow at the same rate. Even worse, processor power will probably lag behind: doubling the density will roughly double my RAM capacity, but algorithmically I will probably not be able to do double the computation on the CPU (it's hard to exploit parallelism perfectly). With an exponential gap between memory and CPU, it's only linear time algorithms that survive.


We don't know what we're doing. A related issue, highlighted in the buses example, is that we don't know what we ultimately want to solve. It depends on what we can solve. If I give you a faster algorithm, you will want to model your problem with more details, making my algorithm too slow.

The core problem that Internet routers have to solve comes down to nothing more than a binary search (each subnetwork has an IP range, and I have to figure out in which of these ranges my destination IP lies). Could we solve it in O(lg n)? With current data rates and traffic, we probably could (barely). But if we speed it up, we can now maintain that interesting statistic, which allows us to bill customers more accurately. And if we could speed it up some more, we have time-per-packet to fit that extra component, which will allow us to track virus infections ex post facto.

Functionality is never fixed. The more you can do, the more I will ask you to do. It makes equal sense to improve an O(n!) algorithm to O(2n), or to improve an O(n3) algorithm to O(n2), or to improve hash tables by 20% (they're already constant time!). In all cases, you will notice that the problem size was just at the limit of what you could handle, and the new algorithm will make somebody very happy.

Apart from the many examples that I heard about at ATT, I can give another. I am working with an OR guy, the kind of person who teaches at Business School and spent significant time optimizing truck schedules for Amazon (not in theory, but actually chasing 2% improvements that they implemented). Do you know what problem he brought to me? He wants to improve an O(n lg n) algorithm.

I remember a quote from David Karger in Advanced Algorithms. He said that one week after you improve a running time from O(n2) to O(n lg2n), you should celebrate. But two years later, you should work on improving it to O(n); most likely, it is a very interesting problem by now.


Intellectual curiosity. Ultimately, all arguments about reality cannot compete with our (very developed) sense of the aesthetic. What lives and dies in a branch of mathematics is determined by what trigers our intellectual curiosity.

My ferm belief is that whenever you have a clean and ellegant problem, there will some fascinating behavior, in the eyes of somebody who loves to understand computation, at any one of the levels of the running time hierarchy.

I have heard the claim that P-vs-NP is the most interesting question, because it's the root node in our binary search tree for understanding problems --- the first thing we ask is whether this problem is in P or not, and then proceed with two different sets of techniques depending on the answer. The root node of the classification tree has to be the most interesting.

But this argument pales when you remember that we're really chasing the intellectually challenging cases. For many problems, a 5th grader who just learned programming can give a polynomial algorithm. This doesn't tell me anything interesting from a scientific perspective. The most interesting behavior happens right at the phase transitiong where the problem becomes hard. For some problems, it's at O(lglg n), while for some it's at O(2n).


A problem in current TCS. Of course, I cannot write a post without being provocative :) So let me state what I view as a significant problem in current TCS.

There is a field of study, namely approximation algorithms (which includes such newer branches as mechanism design), in which you can do fine without much understanding or appreciation of computation. The name of the game in this subfield is rounding LPs and SDPs, a task which requires geometric skills and discrete Maths skills, but not much algorithmic skills. Heck, many papers in the area don't even have a for loop in them! The only interesting algorithmic content is solving the LP or SDP, which few people really work on.

This has led to a hoard of TCS students with very little appreciation of algorithmic phenomena, who basically learned how to hide them all away under the "polynomial" hood. (I'm sure you've met people who are surprised at the idea of storing some forward and backward pointers between two pieces of information for navigation throughout the algorithm...)

I hope this doesn't ruin the future of our field. Once there's a significant group of such people, we could easily diverge into creating useless problems that can be solved by these techniques, and lose track of the fact that we were supposed to do algorithms and complexity. (As I said before, the field evolves based on what the field finds exciting. Another way to put it is that researchers are often in the business of drawing targets around the arrows that they manage to shoot, and I can see these new targets drifting away from the original big one.)

The encouraging bit is that the standards for success in approximation algorithms (for old researchers) still involve crystal clear algorithmic thinking. Consider for instance, the two recent algorithms researchers who won the Gödel Prize: Dan Spielman and Shanghua Teng. They moved from a major polynomial-vs-exponential result with mostly mathematical content (smoothed complexity of LP) to a much more algorithmic question, approximating the sparsest cut in linear time. This latter problem has picked up quite a bit of steam in recent years.

Wednesday, September 2, 2009

Food for thought

Here is a short movie (with subtitles) on Romanian parents whose kids emigrated to the US. It is hilarious, at least for people who have gone through this.


Funny enough, my own parents went through an abridged version of this, even though my mother has a Bachelor in Computer Science. She did her university studies just as Computer Science was introduced, and she got a degree after writing FORTRAN programs on punch cards. After graduating, her parents convinced her that there was no future in this discipline, so she became a mathematician.


Here is a funny article on religion published in the Cosmopolitan in the 1920s. The author was scared by religious fundamentalists who tried to introduce bills limiting the teaching of evolution. Sound familiar?

When it comes to social issues, I am always more comfortable in Romanian circles than in the US. I never feel the need to engage in a conversation on the existence of God, since I genuinely consider any person who believes such a thing to be intellectually inferior --- he prefers to rationalize a feeling rather than "truly feel" a logical conclusion. (Of course, my position is not politically correct in the US.) The truly interesting question, to me, is whether we should allow discrimination on religious grounds --- after all, would you want a professor who has proven himself to be intellectually inferior? While I cannot give a definite answer, I think it is OK to accept religious people as scientists, since there is a lot of technical work in science --- i.e. there are many useful skills, and they seem to be distributed independently of the particular feature on which religious people show inferiority. Thus, rejecting religious people would reduce the pool of talent, rather than concentrate it.


Via Luca, here is a bad philosophical piece on how efficient algorithms show that God exists. There are, of course, many cringe-inducing examples of bad philosophy that abuses concepts in computer science. This is just one that I saw today.

To quote some from last week, consider some talks from the Barriers workshop at Princeton. "P vs NP is really the deepest question in Mathematics, since it talks about the essence of Mathematical proofs." "The decades since P-vs-NP was posed really convinced us that this is the right question to ask about computational efficiency." "P!=NP is a law of the universe."

Now, "computational efficiency" means many things in many contexts. If you're sitting inside ATT routers, O(lg n) may be trivial to achieve, but a prohibitive cost. If you're aggregating over Google's data, O(n2) may be trivial, but prohibitive. And if you're optimizing the bus schedules in NY, O(n!) may be trivial, but prohibitive.

P is not the universal definition of "efficient." The P vs NP question is a misnomer for the question "Can you beat backtracking for the following list [...] of problems?", which is, by all accounts, a fascinating question. But substituting P with "efficient" and talking about the universe is just bad philosophy abusing sounds concepts in Computer Science.

Any attempts to ascribe metaphysical (or "metamathematical") meaning to this question is, like religion, a failure to accept the most plausible interpretation of reality, leading to an urge to replace it with a feel-good half-logical explanation.

Friday, August 28, 2009

Romania

I saw two recent news items that reflect on Romania.

Lance blogs about the ISI Web of Science. I first heard of this thing a few years ago, when the Romanian government instituted paper counting as the means for promotion in any public university. The flavor of paper counting was that you only count "articles" in ISI "A-ranked" publications. This means, for instance, that I could not get a job there, since all my journal papers are in special issues (so they don't count, plus I don't really have a clue which TCS journals are A-ranked).

Now, the basic idea of the Romanian government was not bad. The Romanian university system is fundamentally screwed up on all levels (I will post about this later). Anything that will force them to engage with the rest of the world is generally a good idea. For instance, I would support the idea of having committees that decide minimal area-specific requirements. Say, "you may not apply for a faculty job in TCS group before you have at least 2 papers in STOC, FOCS, SODA, or SoCG." The idea would not be paper counting, but forcing people to notice the outside world (they might like it, and stay involved). Anything except a formal requirement is unlikely to work (since all senior faculty never heard of the outside world, you cannot base the system on recommendation letters, or the idea that a hiring committee will make intelligent decisions).

But the ISI implementation of the idea turned out to be quite wrong. It does not work in TCS, for instance, and it is abused in the other disciplines, as well. The common strategy is to fight hard to get some Romanian journal in the ISI list (say, "Modern Metallurgy" as a somewhat fictional example). Then, accepting papers in that journal is no longer a question of merit, but one of connections (yes, of course your cousin's daughter will get a job in our department) and bribe (yes, editor, of course I will be glad to have you as co-investigator on my compiler optimizations grant). The journal starts being filled with articles like "Red-Black trees with Faster Deletions, and Applications to Steel Production."


On to the second piece of news. BBC and others ran articles on how Madonna was booed at her own concert in Romania, when she spoke against discrimination against gypsies. I thought it would be good to discuss some context for that.

Discrimination against gypsies has its roots in the vanilla-flavored discrimination against any minority which is poor, uneducated, and prone to engaging in criminal behavior. As with the black community in the US, the most frequent critique that you hear in Romania is that the "uneducated" feature was by choice, i.e. it stems from the traditional values of the community. There is probably more of a case for this in Romania (where the communist regime made education mandatory, showered students with fellowships, and made all decisions based on anonymous written exams) than in the US (where many inner city schools are in disarray).

As far as I can tell, education or a stable job eliminates discrimination; we certainly had gypsy teachers in high school, and I never heard any negative comment about it. Again this is different from the US, where affirmative action programs have created a way to rationalize discrimination, even, say, against black MIT alumni.

In any case, contemporary discrimination against gypsies is driven by something quite different: a certain segment of the European media and politicians. Today, there is wide-spread (and very vocal) discrimination against Romanians in some European countries, with extremist politicians calling for mass deportations and things like that.

Again, this is vanilla-flavored discrimination against the large immigrant group who is willing to do more work for less money. But as wage and unemployment numbers don't make good ratings, the media show is built on specific examples: the accordion-playing beggars with 3 naked children in their arms, the pick-pockets, the guys who ate the swans from some park, the shanty houses outside Rome...

The problem? These people are usually gypsies bearing Romanian passports. These highly publicized examples have created a great deal of resentment among Romanians, both towards the Europeans who fail to understand the local ethnic differences, and towards the easier target, the gypsy community.

To illustrate this feeling, I attach a song from a well-known Romanian hip-hop band entitled "Message for Europe." (Warning: strong language, especially in Romanian; the English subtitles are not too good, and nothing can really do justice to the brilliant lyrics -- but you get the idea.)

Wednesday, August 26, 2009

Longest Multipermutation

Here's an algorithmic problem for your enjoyment:

Define a multipermutation to be a sequence of integers that contains all numbers from 1 to some K at least once. Examples include (1,3,1,2,2), or (1,4,3,2); but not (1,2,3,5,5).

Given an array of n integers, find the longest multipermutation in O(n) time.
I saw a recent paper doing this in superlinear time, and of course I had to find a linear-time algorithm :) Can you also find one?

Sunday, August 23, 2009

A Technical Lemma

[You can also read this post in PDF.]

In a recent SODA submission, I had to prove a small technical lemma that I considered more or less straightforward. But what exactly is straightforward is in the eye of the beholder, and one of the referees did not like my (succinct and semicoherent) proof. I was thus asked to provide as detailed an explanation as I could, pronto. (Sound familiar? I maintain that we are all too often missing the forest for the trees, but that's a topic for another post.)

Anyway, I thought I would take the opportunity to explain this small technical lemma to my readers who fancy themselves working with entropy inequalities in the future. If you do lower bounds, such things will sooner or later feel quite standard (as standard as, say, maintaining some forward and backward pointers in your algorithm). So if you fancy yourself working in this area, you might consider going through this proof carefully as an exercise.

(A word of caution: Don't be discouraged if this seems overly technical. This is not "what we do" in lower bounds — the point is to discover non-straightforward stuff. Piano skills are not what makes a great composer.)

So, what exactly are we proving? Let A[0..n-1] be a bit vector, chosen uniformly at random. Let Q(i)=A[0]+...+A[i] be the partial sums of the vector (these were queries in my original problem).

I imagine the vector A being split into k blocks of n/k bits each. I denote by QΔ the Δ-th partial sum inside each block, i.e. the vector ( Q(Δ), Q(Δ + n/k), Q(Δ + 2n/k), ...).

We now look at Q0 (the first query in each block) and QΔ for some arbitrary Δ. The main technical question is: how much more efficient is it to encode Q0 and QΔ jointly, as opposed to encoding them separately?

Lemma 1. H(Q0) + H(QΔ) − H(Q0, QΔ)) = Ω(k)

Proof. The lemma is simple and intuitive. First remark that H(a, a+b, a+b+c) = H(a, b, c); furthermore, this is H(a)+H(b)+H(c) if a, b, and c are independent. Thus, H(Q0) = H(sum in block 1) + H(sum in block 2) + ... Of course, the sum in a block is a binomial with n/k Bernoulli trials. Thus, H(Q0) = (k-1) * h(n/k), where I use h to denote the entropy of a binomial.

Just the same, H(QΔ) = h(Δ) + (k-1) * h(n/k). Thus H(Q0) + H(QΔ) ≈ 2k * h(n/k).

Finally, H(Q0, QΔ)) = h(Δ) + h(n/k - Δ) + h(Δ) + h(n/k - Δ) + ... ≈ k*h(Δ) + k*h(n/k-Δ). Indeed, I can break (Q0, QΔ) into 2k independent components: the sum up to Δ, the sum from there up to n/k, the sum from there up to n/k + Δ, the sum from there up to 2n/k, and so on.

To conclude, we look up the entropy of a binomial on Wikipedia and find that h(m) = 0.5 * ln(πem/2) + O(1/m). Thus, doubling the number of trials m increases the entropy by ln(2)/2 - O(1/m) = Ω(1) bits. The lemma follows: either Δ ≥ n/2k, and we have h(n/k) - h(n/k-Δ) = Ω(1), or otherwise h(n/k) - h(Δ) = Ω(1). □


Now we get to the interesting part. I need to prove that there is a lot of loss in a separate encoding of Q0 and QΔ, even in a probabilistic universe where I've conditioned on some event. Of course, I need a lower bound on the probability of the event (for instance, if the event is A[0]=A[1]=...=0, all entropies are zero, so I can't prove anything).

How rare an event can I handle? The magic answer is Pr[E] ≥ 2-εk, for some small constant ε. Why is this the right answer?

  • First, I should be able to handle such an event. If I have some variable X with some entropy H(X), and I tell you some event E has happened, how much did I knock off the entropy of X? You've just learned roughly lg(1/Pr[E]) bits of entropy (the fact that E happened). Intuitively, H(X) cannot decrease by more than what you learned. Thus, if I had a lower bound of Ω(k) on some entropic quantities, I should be able to maintain it under an event of measure 2-εk.

  • Second, conditioning on such an event E is a very frequent trick in lower bounds — and 2-εk is always the answer :) The event E fixes something (say, some M bits of memory have been read by the algorithm, so Pr[E] ≈ 2-M). How can I bound the amount by which E simplified the problem? (In particular, I need to show the answer is not already known, otherwise there's no more
    lowerbounding to do.)

    A good way to bound the help E can render is to break the problem into 99*M subproblems. Intuitively, the M bits that the algorithm knows cannot help with 99*M independent subproblems; for some of the them, the algorithm must be clueless. Thus, the logic goes in the other direction: I'm considering k blocks precisely because I need to argue about an algorithm that has learned εk bits, i.e. I am in a universe where we condition on an event of probability 2-εk.
Thus, what I really want to prove is:

Lemma 2. For any event E with Pr[E] ≥ 2-εk, we have H(Q0 | E) + H(QΔ | E) − H(Q0, QΔ) | E) = Ω(k)

Proof. Remember the intuition from above: if I tell you that some event E happened, I am affecting your entropies by lg 1/Pr[E]. There's only one problem in implementing this intuition, namely that it's false. Imagine the following distribution: on odd days, a uniformly random vector of n bits; on even days, the vector (0,...,0). The entropy, which is an average, is n/2, i.e. quite high. But if I condition on an event with probability 1/2, I can bring the entropy down to zero.

What I need is concentration for entropies. And how do we always prove concentration? By using independence, which, luckily, we have here — remember how we broke the entropies into independent binomials. Of course, the formal implementation may not seem obvious yet, so let's do it carefully.

Let's assume Δ ≤ n/2k by symmetry. In this case, Lemma 1 boils down to: H(Q0) + H(QΔ) − H(Q0, QΔ)) ≈ 2k * h(n/k) - k*h(Δ) + k*h(n/k-Δ) ≥ k * (h(n/k) - h(Δ)) = Ω(k). I want to exploit the independence of k quantities, each with expectation h(n/k) - h(Δ).

These quantities come from the definition of entropy. Remember that H(X) = E[ lg 1/p(X) ], where p(.) is the probability density function of X. Let's make the following convenient definitions:
  • Xi = the sum of the bits in the i-th block. Let X = (X0, X1, ...).

  • Yi = the sum of the first Δ bits in the i-th block. Let Y = (Y0, Y1, ...). Denote the probability densities for Y by q(.).

In the regime Δ ≤ n/2k, we used the lower bound H(Q0) + H(QΔ) − H(Q0, QΔ)) ≥ H(X) - H(Y). Thus, I am interested in analyzing H(X|E) - H(Y|E).

To this end, I write H(X) - H(Y) = E[ lg q(Y)/p(X) ] = E[ lg (q(Y1) * q(Y2) * ...) / (p(X1) * p(X2) * ...) ] = E[ lg q(Y1) / (p(X1) + lg q(Y2) / p(X2) + ...].

This is in the desired form, the sum of k independent quantities, since (X1, Y1) is independent of (X2, Y2), etc. Each term has expectation h(n/k) - h(Δ) ≥ ln(2)/2 - O(1/m), as we showed in Lemma 1. Thus, I can apply Chernoff and conclude that the sum is Ω(k) with probability 1 - 2-Ω(k).

This probability is so large, that an event happening with probability 2-εk does not scare me any more. Indeed, even if I condition on E, the sum is Ω(k) with probability 1-o(1). Thus, it's expectation is also Ω(k).

Have I just shown H(X|E) - H(Y|E) = Ω(k)? Well, not quite. I've shown E[ lg q(Y)/p(X) | E ] = E[lg 1/p(X) | E] - E[lg 1/q(Y) | E] = Ω(k). But the probability density function p(.) is just another function when I condition on E. Under this conditioning, the real probability density is another function p'(.), i.e. H(X|E) = E[ lg 1/p'(X) ].

First remark that the probability mass inside E is a subset of the mass in the general universe, i.e. p(x) ≥ p'(x) * Pr[E], for any x. Thus p'(x) ≤ p(x) * 2εk and lg 1/p'(x) ≥ lg 1/p(x) - εk. So I can switch between p'(.) and p(.): E[lg 1/p'(X) | E] ≥ E[lg 1/p(X) | E] - εk.

Switching between q(.) and q'(.) is more tricky, since I want an upper bound. It is true that for many y, q'(y) could be much smaller than q(y), thus increasing the "entropy" lg 1/q'(y). Let's call y bad if q'(y) ≤ q(y) / 22εk. How much mass could the bad occupy? We has Sumy q(y) = 1, so Sumy∈Bad q'(y) ≤ 2-2εk. Thus, even inside the event E, the bad y's only have mass 2-εk. Now H(Y|E) is the average over the good y's and the bad y's. For the good, lg 1/q'(y) ≤ lg 1/q(y) + 2εk. For the bad, lg 1/q'(y) ≤ n (the smallest event in my distribution has probability 2-n). But the weight of the bad inside E is 2-εk = o(1/n) for reasonable k. Thus, the contribution of the bad is o(1).

We have thus shown E[lg 1/q'(Y) | E] ≤ E[lg 1/q(Y)] + 2εk. We conclude that H(X|E) - H(Y|E) ≥ E[ lg q(Y)/p(X) | E ] - 3εk = Ω(k). □


That's it. My hope was to present the proof as a sequence of interesting ideas, rather than a long technical thing filled with calculations. I don't know if I succeeded in conveying this feeling; let me know if you need help with any of the steps.

Friday, August 21, 2009

An Evening in Bucharest

5:30pm. I am at the "Brewers' Court," meeting with an old friend from computer olympiads, now faculty at Bucharest Polytechnic. We order 1-liter beers and I describe an algorithm answering the main open problem from a recent paper. The first question I get is quite polite (I tend to get a lot of respect in Romania), but can roughly be translated as "Dude, does this make any sense?" As is often the case, I show that I cannot focus without coauthors. My algorithm is obviously non-sense, except that it contains enough original ideas that a correct algorithm can be reconstructed in 10 minutes.

7:00pm. I take a few more steps on a path that I have been pursuing for a long time: I try to communicate some of the energy, passion, and standards of US research (sorely lacking in the parts of Europe that I have had contact with). We both have 1-liter beers.

8:00pm. My godfather arrives. We all have 1-liter beers, and talk about work in networking (he is working in software at a major corporation in this field; I am at ATT, of course).

9:00pm. Another friend from high school (CN Carol I, class of 2001) arrives. We have 1-liter beers with some food, and gossip about another high-school friend (now CEO of a software company).

11:00pm. Two Romanian re-pats (old friends from MIT) arrive. We have 1-liter beers, and talk about their start-up, the child that one has, etc.

12:30am. Another re-pat (Harvard) arrives. We have 1-liter beers, and talk about the cars that they drive at 150mph, the good old times at Physics Olympiads, and life.

1:30am. Yet another re-pat (MIT) arrives. We have 1-liter beers, and talk about the hedge fund where he works, the likely inflation for the dollar in the next 10 years, and life in Romania vs the US.

3:30am. I make it to my sister's apartment and write blog posts (at the end of my European trip, I am still jet-lagged).

Monday, August 3, 2009

More on Problem 2

You might be wondering what caused the hiatus in posting CEOI problems. My modus operandi as a researcher is to induldge in whatever obsession I currently have. For the past week, the raging obsession has been Problem 2 (tempered, of course, by the beaurocracy of joining AT&T).


Several people found the O(n4) solution, both during the real competition and on the blog. In addition, Berman, Karpinski, and Lingas also found this solution in a recent arXiv paper. (Thanks to Nate Xiaoming Xu for telling me about this paper!) Their paper is not about rectangle covers, but something about wireless antennas. This goes to show that olympiad problems are often deep enough to become research papers. The only additional skill that we have in research is keeping a straight face when we write the introductions. (Don't worry, it comes with age.)

As I announced in the comments to the previous blog post, I found an O(n3) solution. This appears to be tricky (several people tried to reconstruct my solution in the comments, but without success). My solution is described in this post.

I am still quite optimistic that O(n2) might be possible, but so far I haven't been able to beat cubic time. If you find a way, I will be impressed.

If you don't find a way but want to discuss promissing ideas, use the comments to this post. Think of this as an experimental poly-CS project. The discussion is open to anyone. Don't be shy to post, but please be use a nickname for identification ("CarpathianBlackMetal" is much more informative than "Anonymous"). Also number your comments (e.g. "This is comment #4 from CarpathianBlackMetal"). Do not delete a comment if you posted something wrong. Explaining why it is wrong is much more informative for the audience.

Now let me describe the solutions I know for this problem.


Observation 1. In an optimal solution, any two rectangles are either disjoint or "nested" (one of them is taller and has the base included in the other). If two rectangles were to intersect properly, the shorter one can be shrunk horizontally until the rectangles become disjoint.

Let the points have coordinates (x[i], y[i]). Assume they are sorted by x.

Observation 2. We can construct an optimal solution in which every rectangle satisfies:
  1. it extends horizontally from some x[i] to some x[j];
  2. it extends vertically up to A / (x[i] - x[j]), i.e. as high as possible.
  3. it covers points i and j, i.e. y[i], y[j] ≤ A / (x[i] - x[j]).
Perhaps the only nonobvious condition is 3. But note that if the rectangle doesn't cover point i, I might as well move its edge inside (beyond point i), and thus increase the height. The only change that could happen would be to cover more points.

An O(n4) solution. We thus know that there are only O(n2) distinct rectangles to consider. Denote them by R(i,j); though not all pairs (i,j) are valid rectangles.

If R(i,j) is used in the solution, we have a well-contained subproblem to solve: cover all points between x[i] and x[j], which are not already covered by R(i,j), that is, they are above the rectangle. Let A(i,j) be the minimal number of rectangles needed to cover all such points. This is our dynamic program.

We fill the dynamic program by increasing values of j-i. To compute A(i,j), let S be the set of points k (i < k < j) which are not covered by the rectangle. Note that i and j are always covered (by condition 3. above). Thus, any information pertitent to covering S has already been computed by our dynamic program.

Let S={X1, X2, ... }. To compute A(i,j), we need to find splitters a < b < c < ... which minimize A(X1, Xa) + A(X{a+1}, Xb) + A(X{b+1}, Xc) + ...

This is a classic dynamic programming problem itself, or, if you wish, a shortest paths problem where A(x,y) is the length of the path from node x to node y. This subproblem can be solved in O(n2) time (which is "linear time", since there are n2 subproblem A(x,y)). Thus, the original dynamic program is filled in O(n4).

An O(n3) solution. A common strategy for improving the running time is to try to maintain more information. Thus, I first reformulate the dynamic program above to use O(n3) state.

Let T(i,j,k) be the optimal number of rectangles to cover all points (x,y) with x[i] ≤ x ≤ x[j] and y ≥ heights[k], where heights contains all y's in sorted order. Unlike the previous solution, I consider all heights k, not just the maximal one for a fixed i and j.

This dynamic program is easy to fill in O(n4). For some triple (i,j,k), I will either:
  • draw a rectangle from x[i] to x[j] of maximal height. If this height is more than y[k], I get the optimum for covering the remaining points from T(i,j,new height).

  • if there is no rectangle spanning x[i] to x[j], there must be a break point m, such that T(i,j,k) = T(i,m,k) + T(m+1,j,k). Iterate to find the optimal such m.
To fill this in O(n3) requires a nice trick. The order of the computation will be: by increasing value of j-i, and, for some fixed (i,j), by increasing k.

To compute the optimal T(i,j,0), proceed exactly as above, taking time O(n). The rest of T(i,j,k) for k>0 can be computed in constant time per entry. For each T(i,j,k+1), I have two possibilities:
  • T(i,j,k+1) = T(i,j,k).

  • T(i,j,k+1) < T(i,j,k). Thus, the optimal solution for k+1 is different from the optimal solution for k. But the set of points that must be covered by these two solutions only differ by one point P (with y[P]=k). Thus, point P must not be covered by the solution for T(i,j,k+1) — if it were, the better solution for T(i,j,k+1) would also hold for T(i,j,k). Thus, x[P] is a break point in the solution T(i,j,k+1), and we have T(i,j,k+1) = T(i,P-1,k+1) + T(P+1,j,k+1).

Good luck thinking about this problem!

Sunday, July 19, 2009

CEOI: Problem 2

The discussion on Problem 1 was quite active; I even posted a lower bound for the streaming complexity of the problem. The official solution is:

The basic idea is to sweep through each line i, and maintain the “height” of each column, i.e., the number of ones in that column extending upwards from line i. Given the heights of each column, one has to find a subset S of columns maximizing |S| • min(S). You can find this subset by trying values for min(S) and counting how many heights are greater or equal. By maintaining the heights in sorted order, this computation becomes linear time.
To make the algorithm run in O(NM) time, the key observation is that, while the sweep line advances to another row, a height is either incremented or reset to zero. This means that one can maintain the sorted order in linear time: place the values that become zero up front, and maintain the order for the rest (which is unchanged by incrementing them).
Let us now move to a problem of average difficulty. Originally, I was worried that it would be too easy (the contestants really know dynamic programming), but it seems the details behind the dynamic program were unusual enough, and some people missed them.


Problem: Photo. You are given a skyline photograph taken during the night. Some rooms still have the light on. You know that all the buildings can be modeled by rectangles of surface area at most A. Find the minimum number of buildings that can lead to the picture.

In other words, you are given an integer A, and N points at integer coordinates (x,y). You must find a minimum number of rectangles with one side on the x-axis and area at most A, which cover all points. The rectangles may overlap.

Desired running time: polynomial in N.

Friday, July 17, 2009

CEOI: Problem 1

As I've announced before, I was the chair of the Scientific Committee at The 16th Central European Olympiad in Informatics (CEOI'09). I will have a few non-technical post on the experience, but in the next 6 posts, I will simply let you enjoy the problems.


I would encourage all my theory friends to think about these problems. What you might gain:
  1. You can form your own impression on the level of these contests (I think it is quite high, and it will allow us to recuit many researchers down the road. Consider that one has to solve and implement 3 problems in 5 hours.)
  2. These are great algorithmic puzzles, of the kind that is hard to come by (as opposed to math puzzles, which are everywhere, but I find boring).
  3. You may finally answer the question: Are you smarter than an 11th grader? :)
Please feel free to use the comments section. The official solution to each problem will be posted together with the next problem.

To start lightly, today I will show you the easiest problem. (It is perhaps easy enough for an exam in Introduction to Algorithms... What do you think?)

Problem: LOGS. Given an N by M binary matrix, find the area of the biggest rectangle consisting entirely of values of 1, knowing that you can permute the columns of the matrix.

Desired running time: O(MN).

For example, if the matrix is:
001010
111110
011110
111110
011110
111111
110111
110111
000101
010101

The answer should be 21: by permuting the columns such that columns 2, 4, and 5 are adjacent, you have a rectangle of area 21 (rows 2-8 and columns 2, 4, 5).

PS: Yes, today's my birthday. Only 13 more years to win the Nevanlinna Prize :)

Friday, June 26, 2009

The Conceptual Wars

Scott Aaronson, in his regular post-FOCS-notification column, has indicated me as a leader of the Technicians Resistance Army of TCS (a leader, at least, in irășcibility).


As Anup points out, our guerrilla force shall strive to remain concealed in the shadows of the dark STOC/FOCS committee rooms. But, as an exposed member, I must accept my destiny as a preacher of our movement during the coming conceptual wars.

My basic plan is to sing the ideas of theoretical computer science from the rooftops 1. I want to convince you that we have a destiny grander than communicating with quantum aliens. I want to remind you of the beauty of the algorithm. I want to tell you mystical prophecies about the depth our field will achieve when it will show, e.g., that max flow needs superlinear time. And, yes, I want to reprehend you for behaving like a bitter fringe mathematician instead of accepting the place of TCS as the New Maths... Or for having such low self-esteem that you will gladly wipe the floors of the Economics department for some attention. And, against all odds, I want to convince you that solving hard problems is more rewarding than inventing easy new ones.

***

Just, please, hold off the war for a few weeks. For now, I am the chair of the Scientific Committee of the CEOI (Central European Olympiad in Informatics). These kids have prepared for years to get here, and we owe them some exciting problems (which will certainly find their way to the blog later).


–––––
1 This cool quotation comes, I believe, from Scott's job application. I learned of it in a bar from a friend who was on the job market at the same time as Scott. My friend added "I don't have this [...] in me."

Thursday, June 4, 2009

Conceptual Contributions Conference

At STOC, there was an announcement of a new conference dedicated to conceptual contributions, titled ICS (Innovations in Computer Science). Michael asks for more discussion on this conference.

Personally, I am enthusiastic about ICS. It seems like one of the best ideas in decades for improving the quality of STOC/FOCS.

Tuesday, May 12, 2009

Conference in Romania

I am on the program committee for Advances in the Theory of Computing (AITC'09), to take place in Timișoara, Romania on September 26-29. The conference is a new, experimental theory track for a larger and more established conference (The 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC'09).

The deadline for regular papers is May 30. You know you always wanted to visit Romania, so use the next couple of weeks to wrap up a result and send it in!

Alternatively, the conference will double up as a workshop (papers in the "presentation track" will not be published in proceedings, allowing you to submit elsewhere). The deadline for such presentations is July 1st.

So come to discuss the most exciting developments in theory, while enjoying your 1-liter beers with a view of one of the most beautiful Romanian cities.

Friday, May 8, 2009

Letter Grades

In the US, final course grades are expressed in letters (the schemes I've seen are A/B/C, and A/A-/B+/B/B-). This leaves one with the interesting task of drawing barriers in between students (a task best performed at 2am the night before moving out of your apartment).

I am not comfortable with adding different components of the grade (like the midterm and final), as the average and standard deviation are vastly different. Is there a good statistical approach to this problem, with some convincing theory behind it? (Being at IBM Almaden, I should automatically go for rank aggregation, I guess... But there seems to be a lot of noise in the rank, since there are clusters of people with similar scores on each exam.)

Fortunately, if you only give one midterm (and assign low weight to the homework), you can plot the results in 2D. Unfortunately, you may end up with something like this:

Oh well... Everybody knows grades are somewhat random.

Sunday, May 3, 2009

Complexity Everywhere

I know that whenever I write about TCS politics on this blog, it ends up bad. For instance, I get a comment such as the following one (left by an anonymous to my last post):

What makes it tough for some of the papers you cite is the view that shaving off log factors is often viewed as much less interesting than larger improvements.
This, of course, makes my latin blood run even hotter, and I cannot help writing follow-up posts (this is the first). If only I could keep my promise of not writing about politics, my life would be so much simpler. (If only I could learn from history... I got to observe my father become a leading figure in Romanian Dermatology a decade before he could get a faculty position — mainly due to his latin blood. He got a faculty position well into his 50s, essentially going straight to department chair after the previous chair retired.)

So, let's talk about shaving off log factors (a long overdue topic on this blog). As one of my friends once said:
All this talk about shaving off log factors from complexity people, who aren't even capable of shaving on a log factor into those circuit lower bounds...
There is something very deep in this quote. Complexity theorists have gone way too long without making progress on proving hardness, their raison d'être. During this time, drawing targets around the few accidental arrows that hit walls became the accepted methodology. For instance, this led to an obsession about the polynomial / non-polynomial difference, where at least we had an accepted conjecture and some techniques for proving something.

Complexity theory is not about polynomial versus non-polynomial running times. Complexity theory is about looking at computational problems and classifying then "structurally" by their hardness. There are beautiful structures in data structures:
  • dictionaries take constant time, randomized. (But if we could prove that deterministically, dynamic dictionaries need superconstant time per operation, it would be a very powerful message about the power of randomness — one that computer scientists could understand better than "any randomized algorithm in time nc can be simulated deterministically in time n10c if E requires exponential size circuits.")

  • predecessor search requires log-log time. The lower bound uses direct sum arguments for round elimination in communication complexity, a very "complexity topic." A large class of problems are equivalent to predecessor search, by reductions.

  • the hardness of many problems is related to the structure of a binary hierarchy. These have bound of Θ(lg n) or Θ(lg n / lglg n) depending on interesting information-theoretic issues (roughly, can you sketch a subproblem with low entropy?). There are many nonobvious reductions between such problems.

  • we have a less sharp understanding of problems above the logarithmic barrier, but knowledge is slowly developing. For instance, I have a conjecture about 3-player number-on-forehead games that would imply nΩ(1) for a large class of problems (reductions, again!). [This was in my Dagstuhl 2008 talk; I guess I should write it down at some point.]

  • the last class of problems are the "really hard" ones: high-dimensional problems for which there is a sharp transition between "exponential space and really fast query time" and "linear space and really slow query time." Whether or not there are reductions among these is a question that has preoccupied people for quite a while (you need some gap amplification, a la PCP). Right now, we can only prove optimal bounds for decision trees (via communication complexity), and some weak connections to NP (if SAT requires strongly exponential time, partial match requires weakly exponential space).
Ok, perhaps you simply do not care about data structures. That would be short-sighted (faster data structures imply faster algorithms; so you cannot hope for lower bounds for algorithms before proving lower bounds for data structures) — but it is a mistake that I can tolerate.

Let's look at algorithms:
  • Some problems take linear time (often in very non-obvious ways).

  • Sorting seems to take super-linear time, and some problems seem to be as fast as sorting. My favorite example: undirected shortest paths takes linear time, but for directed graphs it seems you need sorting. Why?

  • FFT seems to require Θ(n lg n) time. I cannot over-emphasize how powerful an interdisciplinary message it would be, if we could prove this. There are related problems: if you can beat the permutation bound in external memory, you can solve FFT in o(n lg n). The permutation bound in external memory is, to me, the most promissing attack to circuit lower bounds.

  • some problems circle around the Θ(n sqrt(n)) bound, for reasons unclear. Examples: flow, shortest paths with negative lengths, min convolution with a mask. But we do have some reductions (bipartite matching is as hard as flow, bidirectionally).

  • some problems circle around the n2 bound. Here we do have the beginning of a classification: 3SUM-hard problems. But there are many more things that we cannot classify: edit distance and many other dynamic programs, min convolution (signal processing people thought hard about it), etc.

  • some problems have an n*sort(n) upper bound, and are shown to be X+Y-hard. Though the time distinction between n2 and n*sort(n) is tiny, the X+Y question is as tantalizing as they get.

  • some problems can be solved in nω by fast matrix multiplication, while others seem to be stuck at n3 (all pairs shortest paths, given-weight triangle). But interestingly, this class is related to the n2 problems: if 3SUM needs quadratic time, given-weight triangle requires cubic time; and if min-convolution requires quadratic time, APSP requires cubic time.

  • what can we say about all those dynamic programs that run in time n5 or something like that? To this party, TCS comes empty-handed.

  • how about problems in super-polynomial sub-exponential running time? Ignoring this regime is why the misguided "polynomial / non-polynomial" distinction is often confused with the very meaningful "exponential hardness." There is much recent work here in fixed-parameter tractability. One can show, for instance, that k-clique requires nΩ(k) time, or that some problems require 2Ω(tree-width) time.

    And what can we say about k-SUM and all the k-SUM-hard problems (computational geometry in k dimensions)? This is an important illustration of the "curse of dimensionality" in geometry. I can show that if 3SAT takes exponential time, k-SUM takes nΩ(k) time.

    Finally, what can we say about PTAS running times? In my paper with Piotr and Alex, we showed that some geometric problems requires nΩ~(1/ε^2) running time. This has a powerful structural message: the best thing to do is to exhaustive search after a Johnson-Lindenstrauss projection.

  • inside exponential running time, there is the little-known work of [Impagliazzo-Paturi] showing, for instance, that sparse-3SAT is as hard as general 3SAT. Much more can be done here.
Lest we forget, I should add that we have no idea what the hard distributions might look like for these problems... Average case complexity cannot even talk about superpolynomial running times (a la hidden clique, noisy parity etc). 


This is what complexity theory is about. Sometimes, it needs to understand log factors in the running time. Sometimes, it needs to understand log factors in the exponent. Whereever there is some fascinating structure related to computational hardness, there is computational complexity.

While we construct exotic objects based on additive combinatorics and analyze the bias of polynomials, we should not forget that we are engaging in a temporary exercise of drawing a target around an arrow — a great exploration strategy, as long as it doesn't make us forget where we wanted to shoot the arrow in the first place.

And while complexity theory is too impotent right now to say anything about log factors, it should not spend its time poking fun at more potent disciplines.