Friday, July 17, 2009

CEOI: Problem 1

As I've announced before, I was the chair of the Scientific Committee at The 16th Central European Olympiad in Informatics (CEOI'09). I will have a few non-technical post on the experience, but in the next 6 posts, I will simply let you enjoy the problems.


I would encourage all my theory friends to think about these problems. What you might gain:
  1. You can form your own impression on the level of these contests (I think it is quite high, and it will allow us to recuit many researchers down the road. Consider that one has to solve and implement 3 problems in 5 hours.)
  2. These are great algorithmic puzzles, of the kind that is hard to come by (as opposed to math puzzles, which are everywhere, but I find boring).
  3. You may finally answer the question: Are you smarter than an 11th grader? :)
Please feel free to use the comments section. The official solution to each problem will be posted together with the next problem.

To start lightly, today I will show you the easiest problem. (It is perhaps easy enough for an exam in Introduction to Algorithms... What do you think?)

Problem: LOGS. Given an N by M binary matrix, find the area of the biggest rectangle consisting entirely of values of 1, knowing that you can permute the columns of the matrix.

Desired running time: O(MN).

For example, if the matrix is:
001010
111110
011110
111110
011110
111111
110111
110111
000101
010101

The answer should be 21: by permuting the columns such that columns 2, 4, and 5 are adjacent, you have a rectangle of area 21 (rows 2-8 and columns 2, 4, 5).

PS: Yes, today's my birthday. Only 13 more years to win the Nevanlinna Prize :)

16 comments:

Anonymous said...

Loop over rows from top to bottom, in each cell store the length of the longest chain of 1's above it, do counting sort on each row to sort in reverse order, then loop over the row again from left to right and take the best prefix, then take the best over all rows.

Challenge:
Prove/disprove: There is a one-pass algorithm maintaining only O(1) words of space.

Also, I see an American won. GO USA!

Mihai said...

Anon, this is O(N^2), assuming N≥M.

70 points.

Anonymous said...

Honest honest honest, I just realized that it was N^2 and rushed back to my computer to fix it, but somehow you responded before I could! You can also get O(MN + M^2) by rotating the matrix. Take the cheaper of rotating vs. non-rotating, and you get O(MN).

Damn you, and happy birthday.

-Jelani

Mihai said...

Regarding the challenge... It seems easy to show that Omega(M) space is needed by reduction from set disjointness.

Mihai said...

Jelani, what are you doing solving kids' problems? I have more serious stuff for you in the later posts. :)

Also, I should add there's a more elegant solution to get O(MN), which is also a "streaming" one.

Anonymous said...

I'm a kid at heart.

And, next time I'll try to spend 2 seconds thinking before I issue challenges ...*woops*

Anonymous said...

Happy birthday Mihai. And this question is cute... --Sariel

P.S. Good luck with the prize

Anonymous said...

The more elegant solution is to notice that after each row each "longest chain" either becomes zero-length or increases by 1. Therefore, it is very easy to keep the set of columns sorted (just move the 0s to the end). O(MN).

Anonymous said...

Also, could you just sketch the proof of how you prove the need for O(M) space for those who are not into lower bounds :D

Anonymous said...

Omega(M), sorry :D

Mihai said...

The lower bound proof is a reduction from set disjointness:

Set disjointness. Alice and Bob have sets S, T subset [M]. They want to communicate in order to find whether S is disjoint from T.

Theorem: The communication needed is Omega(M).

The reduction. Alice generates a matrix of (N-1) by M. Every column in S will be all ones; other columns are all zeroes.

Bob adds the last row of the matrix: he puts a one for every element in the complement of T.

If S and T are disjoint, S is included in the complement of T, so all 1-columns gain one more 1. Thus, the answer is |S|*N.

Otherwise, some column does not gain the final one. Thus, the answer is strictly less than |S|*N.

If there is a streaming algorithm with space o(M), Alice can run it on the first N-1 rows, then send the memory state to Bob, and Bob can finish running the algorithm. The algorithm would answer disjointness, thus contradicting the communication lower bound.

Anonymous said...

Thanks for the explanation. It is very neat. Can you use communication complexity to find lower bounds on non-necessarily streaming algorithms as well? Can you recommend a good intro to lowerbounds/comm complexity?

Anonymous said...

Seems that doing a normal O(M log M) sort for each row gives an O(NM log M) solution. Does the contest distinguish between O(NM) and O(NM log M)?

Mihai said...

Can you use communication complexity to find lower bounds on non-necessarily streaming algorithms as well?

I have been using it rather successfully to prove lower bounds for data structures. For instance, look at this paper.

In algorithms, we don't really know any lower bounds at all, except by reductions (assume SAT takes exponential time, then something else).

Can you recommend a good intro to lowerbounds/comm complexity?

Not really... The field is yet unwritten, so to say. For any field that is young and active, I would recommend starting with a paper and doing random walks in the bibliography.

Mihai said...

Seems that doing a normal O(M log M) sort for each row gives an O(NM log M) solution. Does the contest distinguish between O(NM) and O(NM log M)?

This solution runs some 4-5 times slower, and earns 50 points.

Piotr said...

about sorting: we want to have a sorted list of columns, according to how long during the scan we have seen 1's in a column without interruption

if you scan and you encounter a 0 in a columns that is in the list, you delete it from the list

if you encounter a 1 in a column not in the list, you insert it at the end

to facilitate deletions in constant time, we can have doubly linked list, so each column knows its predecessor and successor on the list (if it is in the list)

But with O(1) words of space, I do not know.