Sunday, July 19, 2009

CEOI: Problem 2

The discussion on Problem 1 was quite active; I even posted a lower bound for the streaming complexity of the problem. The official solution is:

The basic idea is to sweep through each line i, and maintain the “height” of each column, i.e., the number of ones in that column extending upwards from line i. Given the heights of each column, one has to find a subset S of columns maximizing |S| • min(S). You can find this subset by trying values for min(S) and counting how many heights are greater or equal. By maintaining the heights in sorted order, this computation becomes linear time.
To make the algorithm run in O(NM) time, the key observation is that, while the sweep line advances to another row, a height is either incremented or reset to zero. This means that one can maintain the sorted order in linear time: place the values that become zero up front, and maintain the order for the rest (which is unchanged by incrementing them).
Let us now move to a problem of average difficulty. Originally, I was worried that it would be too easy (the contestants really know dynamic programming), but it seems the details behind the dynamic program were unusual enough, and some people missed them.


Problem: Photo. You are given a skyline photograph taken during the night. Some rooms still have the light on. You know that all the buildings can be modeled by rectangles of surface area at most A. Find the minimum number of buildings that can lead to the picture.

In other words, you are given an integer A, and N points at integer coordinates (x,y). You must find a minimum number of rectangles with one side on the x-axis and area at most A, which cover all points. The rectangles may overlap.

Desired running time: polynomial in N.

18 comments:

Anonymous said...

Unless I misunderstand the problem, this can be done by a simple linear time greedy algorithm: start from the left hand side, cover as many columns as possible using rectangle of surface area at most A, then repeat on the remaining columns. What am I missing?

Mihai said...

What am I missing?

In the optimal solution to some instances, the rectangles actually overlap.

Anonymous said...

Ah, Nevermind I take that back. Tricky.

Anonymous said...

Ok, now that i've had my breakfast, heres take 2:

Subproblem is a rectangular subset of the skyline with all x and y coordinates of the corners appearing as x or y coordinates of the points. Thus there are n^4 relevant subproblems. Notice that, given a building of minimum height in the optimum solution, we can assume without loss of generality that there are no other buildings that overlap its left or right boundary (otherwise can "compress" this lowest building). Therefore, fixing such a lowest building, this leaves 3 subproblems: the skyline left of the building, the skyline above the building, and the skyline to the right of the building. Notice that there is also at most n^4 choices of "lowest building". Therefore, doing DP and memoizing as usual gives an O(n^8) algorithm. Yikes! I think the runtime can be improved.

Anonymous said...

Similarly to anon's comment, there exists an optimal solution where the x coordinates of the buildings nest (i.e. like correctly nested parenthesis).

Dynamic programming: compute a[x1][x2][h] to be the smallest number of buildings needed to cover all points between x1, x2 which have a height y > h.

To compute a[x1][x2][h] iterate over all possible x3's where the building that starts at x1 (or the smallest x coordinate of a point in the current region) can end, compute the largest possible height h' of this block and the solution is min{1 + a[x1][x3][h'] + a[x3][x2][h]} over all possible choices of x3.

Space O(n^3), time O(n^4).

Anonymous said...

Also, challenge: prove a lower bound on a streaming algorithm (one that reads the coordinates of the points in increasing order of (x,y))

rpeng said...
This comment has been removed by the author.
rpeng said...

I saw a problem similar to this a while ago, which was given M horizontal line segments, select a min cost subset so all N points are covered from above. There is O(N^2M) for that as well.

I am still very suspicious that something O(NM) or even (M+N)*polylog isn't possible for it. Is there an obvious reason that such solutions can't exist?

Mihai said...

Anonymous at 10:55 has a correct O(n^4) solution. This was enough to get 100 points during the contest.

Richard: Let's take it easy :) Even beating O(n^4) is not obvious, so never mind linear time for now.

Challenge: give an O(n^3) algorithm. [Yes, I know how to do it.]

Challenge: Can you do o(n^3)? [No clue.]

u1ik said...

Challenge: give an O(n^3) algorithm. [Yes, I know how to do it.]

We could use the same trick that transforms O(n^3) into O(n^2) in the optimal binary search tree algorithm.

For each triple x1, x2, h remember the optimal x3 in o[x1][x2][h]. In order to find x3 it is sufficient to iterate from o[x1][pred(x2)][h] to o[succ(x1)][x2][h].

Mihai said...

@u1ik: This was also my first approach to the problem. But I couldn't see any obvious reason why this property holds. When you think about it, it's possible that adding a point to the right might reconfigure the optimal solution in quite drastic, non-local ways...

My O(n^3) is clean and "obviously works."

u1ik said...
This comment has been removed by the author.
Anonymous said...

O(n^3).

First compute b[h1][h2] the minimal number of *non-overlapping* buildings needed to cover every point at height between h1 and h2 (greedy alg O(n)).

Then compute a[h1][h2] the minimal number of buildings needed to cover all points from h1 to h2 in O(n^3) using b[h1][h2].

Anonymous said...

My previous claim of O(n^3) with b[h1][h2] and a[h1][h2] is bogus.

Anonymous said...

The optimal solution either (1) splits into two independent subproblems by x coordinate, or (2) uses a single rectangle to cover the first and the last (by x coordinate) points. This seems to give about n^2 subproblems and n choices for each. (The key observation is the one about nesting parentheses.)

Mihai said...

Anon, the state space is n^3, since you also need to remember the height that was last cut.

rpeng said...
This comment has been removed by the author.
rpeng said...

I have been giving this problem occassional thought, but am nowhere near finding an O(n^3). Does your O(n^3) not use any high-powered data structures? Is it based on optimizing using the same state used for the O(n^4)?

I think it might be possible to prove there is some definition of o[x1][x2][h] that is monotone in x2 using the fact dp[x1][x2][h] is monotone in x2:
d[x1][x2+1][h]=dp[x1][x2][h], in which case o[x1][x2+1][h] also gives an optimal solution to [x1][x2][h] by monotonicity.
Otherwise, dp[x1][x2+1][h]=dp[x1][x2][h]+1, o[x1][x2][h] would also be optimal for [x1][x2+1][h] since a single rectangle can cover the right most point.

This doesn't seem to lead to O(n^3) though since it's not clear at choice of optimal decision points are these. It also doesn't work for the weighted segment cover version. Div conquer techniques might be viable to give O(n^3logn) though.

It might also be possible to view the problem as O(n^2) states of [x1][x2], but it requires another DP on the lowest rectangle covering [x1][x2] and breaking up the points remaining into intervals using another DP. That step seems to bring things back into O(n^4) again though.