Sunday, July 19, 2009

CEOI: Problem 2

The discussion on Problem 1 was quite active; I even posted a lower bound for the streaming complexity of the problem. The official solution is:

The basic idea is to sweep through each line i, and maintain the “height” of each column, i.e., the number of ones in that column extending upwards from line i. Given the heights of each column, one has to find a subset S of columns maximizing |S| • min(S). You can find this subset by trying values for min(S) and counting how many heights are greater or equal. By maintaining the heights in sorted order, this computation becomes linear time.
To make the algorithm run in O(NM) time, the key observation is that, while the sweep line advances to another row, a height is either incremented or reset to zero. This means that one can maintain the sorted order in linear time: place the values that become zero up front, and maintain the order for the rest (which is unchanged by incrementing them).
Let us now move to a problem of average difficulty. Originally, I was worried that it would be too easy (the contestants really know dynamic programming), but it seems the details behind the dynamic program were unusual enough, and some people missed them.


Problem: Photo. You are given a skyline photograph taken during the night. Some rooms still have the light on. You know that all the buildings can be modeled by rectangles of surface area at most A. Find the minimum number of buildings that can lead to the picture.

In other words, you are given an integer A, and N points at integer coordinates (x,y). You must find a minimum number of rectangles with one side on the x-axis and area at most A, which cover all points. The rectangles may overlap.

Desired running time: polynomial in N.

Friday, July 17, 2009

CEOI: Problem 1

As I've announced before, I was the chair of the Scientific Committee at The 16th Central European Olympiad in Informatics (CEOI'09). I will have a few non-technical post on the experience, but in the next 6 posts, I will simply let you enjoy the problems.


I would encourage all my theory friends to think about these problems. What you might gain:
  1. You can form your own impression on the level of these contests (I think it is quite high, and it will allow us to recuit many researchers down the road. Consider that one has to solve and implement 3 problems in 5 hours.)
  2. These are great algorithmic puzzles, of the kind that is hard to come by (as opposed to math puzzles, which are everywhere, but I find boring).
  3. You may finally answer the question: Are you smarter than an 11th grader? :)
Please feel free to use the comments section. The official solution to each problem will be posted together with the next problem.

To start lightly, today I will show you the easiest problem. (It is perhaps easy enough for an exam in Introduction to Algorithms... What do you think?)

Problem: LOGS. Given an N by M binary matrix, find the area of the biggest rectangle consisting entirely of values of 1, knowing that you can permute the columns of the matrix.

Desired running time: O(MN).

For example, if the matrix is:
001010
111110
011110
111110
011110
111111
110111
110111
000101
010101

The answer should be 21: by permuting the columns such that columns 2, 4, and 5 are adjacent, you have a rectangle of area 21 (rows 2-8 and columns 2, 4, 5).

PS: Yes, today's my birthday. Only 13 more years to win the Nevanlinna Prize :)

Friday, June 26, 2009

The Conceptual Wars

Scott Aaronson, in his regular post-FOCS-notification column, has indicated me as a leader of the Technicians Resistance Army of TCS (a leader, at least, in irășcibility).


As Anup points out, our guerrilla force shall strive to remain concealed in the shadows of the dark STOC/FOCS committee rooms. But, as an exposed member, I must accept my destiny as a preacher of our movement during the coming conceptual wars.

My basic plan is to sing the ideas of theoretical computer science from the rooftops 1. I want to convince you that we have a destiny grander than communicating with quantum aliens. I want to remind you of the beauty of the algorithm. I want to tell you mystical prophecies about the depth our field will achieve when it will show, e.g., that max flow needs superlinear time. And, yes, I want to reprehend you for behaving like a bitter fringe mathematician instead of accepting the place of TCS as the New Maths... Or for having such low self-esteem that you will gladly wipe the floors of the Economics department for some attention. And, against all odds, I want to convince you that solving hard problems is more rewarding than inventing easy new ones.

***

Just, please, hold off the war for a few weeks. For now, I am the chair of the Scientific Committee of the CEOI (Central European Olympiad in Informatics). These kids have prepared for years to get here, and we owe them some exciting problems (which will certainly find their way to the blog later).


–––––
1 This cool quotation comes, I believe, from Scott's job application. I learned of it in a bar from a friend who was on the job market at the same time as Scott. My friend added "I don't have this [...] in me."

Thursday, June 4, 2009

Conceptual Contributions Conference

At STOC, there was an announcement of a new conference dedicated to conceptual contributions, titled ICS (Innovations in Computer Science). Michael asks for more discussion on this conference.

Personally, I am enthusiastic about ICS. It seems like one of the best ideas in decades for improving the quality of STOC/FOCS.

Tuesday, May 12, 2009

Conference in Romania

I am on the program committee for Advances in the Theory of Computing (AITC'09), to take place in Timișoara, Romania on September 26-29. The conference is a new, experimental theory track for a larger and more established conference (The 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC'09).

The deadline for regular papers is May 30. You know you always wanted to visit Romania, so use the next couple of weeks to wrap up a result and send it in!

Alternatively, the conference will double up as a workshop (papers in the "presentation track" will not be published in proceedings, allowing you to submit elsewhere). The deadline for such presentations is July 1st.

So come to discuss the most exciting developments in theory, while enjoying your 1-liter beers with a view of one of the most beautiful Romanian cities.

Friday, May 8, 2009

Letter Grades

In the US, final course grades are expressed in letters (the schemes I've seen are A/B/C, and A/A-/B+/B/B-). This leaves one with the interesting task of drawing barriers in between students (a task best performed at 2am the night before moving out of your apartment).

I am not comfortable with adding different components of the grade (like the midterm and final), as the average and standard deviation are vastly different. Is there a good statistical approach to this problem, with some convincing theory behind it? (Being at IBM Almaden, I should automatically go for rank aggregation, I guess... But there seems to be a lot of noise in the rank, since there are clusters of people with similar scores on each exam.)

Fortunately, if you only give one midterm (and assign low weight to the homework), you can plot the results in 2D. Unfortunately, you may end up with something like this:

Oh well... Everybody knows grades are somewhat random.

Sunday, May 3, 2009

Complexity Everywhere

I know that whenever I write about TCS politics on this blog, it ends up bad. For instance, I get a comment such as the following one (left by an anonymous to my last post):

What makes it tough for some of the papers you cite is the view that shaving off log factors is often viewed as much less interesting than larger improvements.
This, of course, makes my latin blood run even hotter, and I cannot help writing follow-up posts (this is the first). If only I could keep my promise of not writing about politics, my life would be so much simpler. (If only I could learn from history... I got to observe my father become a leading figure in Romanian Dermatology a decade before he could get a faculty position — mainly due to his latin blood. He got a faculty position well into his 50s, essentially going straight to department chair after the previous chair retired.)

So, let's talk about shaving off log factors (a long overdue topic on this blog). As one of my friends once said:
All this talk about shaving off log factors from complexity people, who aren't even capable of shaving on a log factor into those circuit lower bounds...
There is something very deep in this quote. Complexity theorists have gone way too long without making progress on proving hardness, their raison d'être. During this time, drawing targets around the few accidental arrows that hit walls became the accepted methodology. For instance, this led to an obsession about the polynomial / non-polynomial difference, where at least we had an accepted conjecture and some techniques for proving something.

Complexity theory is not about polynomial versus non-polynomial running times. Complexity theory is about looking at computational problems and classifying then "structurally" by their hardness. There are beautiful structures in data structures:
  • dictionaries take constant time, randomized. (But if we could prove that deterministically, dynamic dictionaries need superconstant time per operation, it would be a very powerful message about the power of randomness — one that computer scientists could understand better than "any randomized algorithm in time nc can be simulated deterministically in time n10c if E requires exponential size circuits.")

  • predecessor search requires log-log time. The lower bound uses direct sum arguments for round elimination in communication complexity, a very "complexity topic." A large class of problems are equivalent to predecessor search, by reductions.

  • the hardness of many problems is related to the structure of a binary hierarchy. These have bound of Θ(lg n) or Θ(lg n / lglg n) depending on interesting information-theoretic issues (roughly, can you sketch a subproblem with low entropy?). There are many nonobvious reductions between such problems.

  • we have a less sharp understanding of problems above the logarithmic barrier, but knowledge is slowly developing. For instance, I have a conjecture about 3-player number-on-forehead games that would imply nΩ(1) for a large class of problems (reductions, again!). [This was in my Dagstuhl 2008 talk; I guess I should write it down at some point.]

  • the last class of problems are the "really hard" ones: high-dimensional problems for which there is a sharp transition between "exponential space and really fast query time" and "linear space and really slow query time." Whether or not there are reductions among these is a question that has preoccupied people for quite a while (you need some gap amplification, a la PCP). Right now, we can only prove optimal bounds for decision trees (via communication complexity), and some weak connections to NP (if SAT requires strongly exponential time, partial match requires weakly exponential space).
Ok, perhaps you simply do not care about data structures. That would be short-sighted (faster data structures imply faster algorithms; so you cannot hope for lower bounds for algorithms before proving lower bounds for data structures) — but it is a mistake that I can tolerate.

Let's look at algorithms:
  • Some problems take linear time (often in very non-obvious ways).

  • Sorting seems to take super-linear time, and some problems seem to be as fast as sorting. My favorite example: undirected shortest paths takes linear time, but for directed graphs it seems you need sorting. Why?

  • FFT seems to require Θ(n lg n) time. I cannot over-emphasize how powerful an interdisciplinary message it would be, if we could prove this. There are related problems: if you can beat the permutation bound in external memory, you can solve FFT in o(n lg n). The permutation bound in external memory is, to me, the most promissing attack to circuit lower bounds.

  • some problems circle around the Θ(n sqrt(n)) bound, for reasons unclear. Examples: flow, shortest paths with negative lengths, min convolution with a mask. But we do have some reductions (bipartite matching is as hard as flow, bidirectionally).

  • some problems circle around the n2 bound. Here we do have the beginning of a classification: 3SUM-hard problems. But there are many more things that we cannot classify: edit distance and many other dynamic programs, min convolution (signal processing people thought hard about it), etc.

  • some problems have an n*sort(n) upper bound, and are shown to be X+Y-hard. Though the time distinction between n2 and n*sort(n) is tiny, the X+Y question is as tantalizing as they get.

  • some problems can be solved in nω by fast matrix multiplication, while others seem to be stuck at n3 (all pairs shortest paths, given-weight triangle). But interestingly, this class is related to the n2 problems: if 3SUM needs quadratic time, given-weight triangle requires cubic time; and if min-convolution requires quadratic time, APSP requires cubic time.

  • what can we say about all those dynamic programs that run in time n5 or something like that? To this party, TCS comes empty-handed.

  • how about problems in super-polynomial sub-exponential running time? Ignoring this regime is why the misguided "polynomial / non-polynomial" distinction is often confused with the very meaningful "exponential hardness." There is much recent work here in fixed-parameter tractability. One can show, for instance, that k-clique requires nΩ(k) time, or that some problems require 2Ω(tree-width) time.

    And what can we say about k-SUM and all the k-SUM-hard problems (computational geometry in k dimensions)? This is an important illustration of the "curse of dimensionality" in geometry. I can show that if 3SAT takes exponential time, k-SUM takes nΩ(k) time.

    Finally, what can we say about PTAS running times? In my paper with Piotr and Alex, we showed that some geometric problems requires nΩ~(1/ε^2) running time. This has a powerful structural message: the best thing to do is to exhaustive search after a Johnson-Lindenstrauss projection.

  • inside exponential running time, there is the little-known work of [Impagliazzo-Paturi] showing, for instance, that sparse-3SAT is as hard as general 3SAT. Much more can be done here.
Lest we forget, I should add that we have no idea what the hard distributions might look like for these problems... Average case complexity cannot even talk about superpolynomial running times (a la hidden clique, noisy parity etc). 


This is what complexity theory is about. Sometimes, it needs to understand log factors in the running time. Sometimes, it needs to understand log factors in the exponent. Whereever there is some fascinating structure related to computational hardness, there is computational complexity.

While we construct exotic objects based on additive combinatorics and analyze the bias of polynomials, we should not forget that we are engaging in a temporary exercise of drawing a target around an arrow — a great exploration strategy, as long as it doesn't make us forget where we wanted to shoot the arrow in the first place.

And while complexity theory is too impotent right now to say anything about log factors, it should not spend its time poking fun at more potent disciplines.