Sunday, May 6, 2012

Presburger Award

Hurah!!I've got some excellent news: I was co-awarded the 2012 Presburger Award, to be shared with Venkat Guruswami ,the young superstar of error correcting codes.
In short, the Presburger Award is the European "young-scientist award" in Theory.
It is given anually by the European Association for Theoretical Computer Science (EATCS)
 to a scientist under 35 "for outstanding contributions in
 Theoretical Computer Sciene as documented by  a series of published papers."

  Not surprisingly, I got the award for my work on data-structure lower bounds.The citation  has a lot of nice things to say. Go read it :)

Sunday, April 15, 2012

Happy Easter

Happy Easter!

...to all my readers cellebrating today in the Eastern Orthodox tradition.
According to persistent rumours:
Χριστός ἀνέστη
Christos a înviat!

Thursday, April 5, 2012

Happpy recovery from the FOCS deadline:with a funny picture :)

Happpy recovery from the FOCS deadline everyone:)

With this kind of support from NY fashion stores, it'sodd that we have recruiting problems i n our field )

Sunday, November 6, 2011

New York Theory Day

The Fall 2011 New York Theory Day is happening this Friday, November 11th, at NYU (Courant), and features yours truly as a speaker.

I hope to see many of you there!

Monday, September 19, 2011

Follow-up: Sampling a discrete distribution

This is a follow-up to my last post, on a puzzle about "Sampling a discrete distribution" (see my comment there for the solution I originally thought of).


As an anonymous commenter (Rex?) points out, a nice elementary solution that has been known for a long time. I admit to not knowing this solution ahead of time, and use the fact that it's not really my field as an excuse. Here I want to summarize this solution for educational purposes.

Imagine each sample in the distribution is a vase containing liquid proportional to the probability of the outcome.

We classify these vases into 3 types:
LO —  probability less than 1/n
GOOD –  probability precisely 1/n
HI –  probability greater then 1/n

In the ideal case when every vase is GOOD, we are done (uniform sampling). If not, we  move towards this ideal case by inductively applying the following subroutine:

** Pick a LO vase (say, with x amount of liquid) and a HI vase (with y amount of liquid).

Grab a new vase and fill it up to the 1/n mark: pour all the liquid from the LO vase, and (1/n)-x liquid from the HI vase (remember that a HI vase has liquid y>1/n, so this is certainly possible). Note that the HI vase may now become LO, GOOD, or stay HI.

/end of **

Observe that the process terminates in n steps, since at each step we increment the number of GOOD vases.  If we maintain 3 linked lists with samples of each type, each step can be done in  O(1) time, so O(n) time total.

At the end all vases are GOOD so we use uniform sampling. If we get a random x in [0,1], we use ⌊x·n⌋ to indicate that vase and x·n mod 1 to pick a molecule of  water inside that vase. If we trace back the original vase from where this molecule came, we have sampled according to the initial distribution.

Now we describe how to "undo" the water pouring at query time, i.e. tracing the origin of each water molecule. Observe that operation (**) is never applied to a GOOD vase. So when a vase becomes GOOD, it has reached final form. But the water in a GOOD vase can only come from two distinct vases. So we can essentially draw a water-mark on the vase, to separate the LO water from HI water and store pointers to the original vases whose water got poured into the GOOD vase. Then the sampling algorithm only need one comparison, i.e. it compares x·n mod 1 to the mark on the vase ⌊x·n⌋, and follows a pointer from this vase to the input vases from which water was poured. Note that only one pointer is followed, since a GOOD vase is never touched again by operation (**). Constant time!

Friday, September 16, 2011

Sampling a discrete distribution

The following cute question came up at lunch today (all credit goes to Aaron Archer).


Informal challenge: You are given a discrete distribution with n possible outputs (and you know the probability pi of each output), and you want to sample from the distribution. You can use a primitive that returns a random real number in [0,1].

This was the way the problem came to me, and the goal was to find a very fast practical solution. Think of n as around 106 and use a regular machine as your machine model (real number means a "double" or something like that).


Formal challenge: To formalize this completely for a bounded-precision computer, we can say that we work on a w-bit machine and probabilities have w-bit precision. The distribution is given by n integers xi, and these values sum to exactly 2w. Output "i" should be produced with probability exactly xi / 2w. You are given a primitive that produces a uniformly distributed w-bit number.

The challenge is to get O(1) worst-case time. (NB: some fancy data structures required.)

Thursday, July 28, 2011

International Olympiad, Day 2

I suspect I got a lot of bad karma this day (I'm the evil mind behind "Crocodile" and "Elephant".) Congratulations to the winners! See here for final results.

Problem Crocodile. You have a weighted undirected graph with a start node and k designated target nodes. Two players play a full-information game, taking turns:
  1. Player 1 moves on the nodes of the graph, starting at the designated start node, and aiming to reach a target node. In each turn, she can traverse any edge out of her current node [except the blocked edge / see below], incurring a cost equal to the weight of the edge.

  2. Player 2 can block one edge at every turn. When an edge is blocked, Player 1 cannot traverse it in the next turn. An edge can be blocked multiple times, and any past edge is "unblocked" when a new edge is blocked.
Find the minimum budget B such that Player 1 can reach a target node with cost ≤ B, regardless of what Play 2 does. Running time: O~(n+m).

Problem Elephants. You have n elephants on the line at given coordinates. The elephants make n moves: in each unit of time, some elephant moves to another position. You have cameras that can photograph any interval of length L of the line. After each move, output the minimum number of cameras needed to phtograph all elephants in the current configuration.

Running time: subquadratic in n.

Problem Parrots. You are given a message of M bytes (0..255) to transmit over a weird channel. The channel can carry elements between 1 and R, but elements don't arrive in order (they are permuted adversarially). Send the message, minimizing the number L of elements sent across the channel.

Running time: polynomial. (Yes, this is an easy problem for a trained theorist.)