Comments on WebDiarios de Motocicleta: Sampling a discrete distribution

My solution went as follows. In principle, each po...

2011-09-19T12:46:28.918-04:00

My solution went as follows. In principle, each possible output gets an interval inside [0,1] and the generated sample is the interval stabbed by the random number. Trivially, this translates to predecessor search, which is not constant time. But we are allowed to shuffle the intervals around, and this will make predecessor search O(1)-time. (It turns out that this magical shuffling is just sorting by length.)

Assume all probabilities are between p and 2p. Then we can easily solve the problem in O(1) time.

Observation 1: Given a point in [0,1], we can locate it inside intervals of roughly equal size (up to a factor of 2) in constant time.

Proof: We impose a grid of precision p and store the interval stabbed by every grid point. Then round the query point to the grid, and do a linear search among the intervals starting from where the nearest grid point lands. This cannot inspect more than O(1) intervals.

Observation 2: Bucket probabilities between 2^k/2^w and 2^(k+1)/2^w for every k. Apply the solution above inside a bucket. The query first needs to figure out which bucket the random number landed in. This is predecessor search among w start points (the beginning of the first interval of each bucket). But this only needs constant time with fusion trees.

While fusion trees are a non-elementary data structure and probably not practical, one can say they "unite theory and practice". In practice, binary search among 32 values is not O(1) time, it's more like zero time because everything is in cache.

As an anonymous commenter points out, I overcomplicate things, and there is a very nice elementary solution already known in the literature. I'll summarize this in a later comment.

Not quite answering your question: See page 3 of t...

2011-09-19T08:29:42.493-04:00

Not quite answering your question: See page 3 of this paper, section "Refinement", for a solution with expected constant time and no fancy data structure (only a trie with precomputed values).

The Alias method does solve this problem, at least...

2011-09-17T16:18:53.463-04:00

The Alias method does solve this problem, at least as it is described in Non-Uniform Random Variate Generation by Luc Devroye http://cg.scs.carleton.ca/~luc/rnbookindex.html

The key insight is that any discrete distribution over n possibilities can be rewritten as a mixture with uniform mixing proportions over 2-possibility discrete distributions. Then the cost for a sample (after setup) is 1 or 2 table lookups and 1 or 2 calls to the primitive random number source. The setup procedure can be done in linear time and only requires saving 2 tables of O(n) space since WLOG we can order the first components of each mixture component by index.

Isn't the obvious O(log n)-time algorithm prac...

2011-09-17T14:18:59.197-04:00

Isn't the obvious O(log n)-time algorithm practical enough?

Jim is right. It is explained a little better here...

2011-09-17T13:10:14.708-04:00

Jim is right. It is explained a little better here: http://www.scribd.com/doc/55772825/62/Alias-Method

Basically you want to cut up the probabilities p_i into 2n pieces y_jk for k=1,2, j=1,...,n, in such a way that y_j1+y_j2=1/n, and the sum of y_jk that are associated with p_i sum up to p_i. (Then sampling in constant time is easy.) This turns out to be easy: Just take the smallest p_i (say p_min), define y_11 = p_min and y_12 = 1/n-p_min. Associate y_11 with p_min and y_12 with p_max (the largest p_i). Set p_max = p_max - 1/n + p_min (which is larger than 0 since p_max >= 1/n). Now recurse on the probabilities except p_min.

Apologies for the ugly writeup of the very beautiful idea. And thanks Rex! I learned something today!

Here's a 2-page note by Warren Smith, who appe...

2011-09-16T23:51:24.227-04:00

Here's a 2-page note by Warren Smith, who appears to have independently (but subsequently) discovered the linear-time version of Walker's method. It works a lot like quicksort partitioning.

Mihai, I think the idea is that in step 1, if the ...

2011-09-16T16:55:10.001-04:00

Mihai, I think the idea is that in step 1, if the tallest column has height 1/n + eps and the shortest one has height delt, you don't just move eps, but 1/n - delt. Then each column has only one break. (That's what I got from the pictures in the PDF, at least)

I would love to hear the fancy data structure solution!

The space is O(n) and the preprocessing time=sorti...

2011-09-16T15:24:43.714-04:00

The space is O(n) and the preprocessing time=sorting time.

Anon Rex, I don't quite see how this solves the problem. (But it's likely that I'm misunderstanding the algorithm).

If many columns have height 1/N + eps, and some are much below 1/N, you need to move mass from many high columns to a low one. Then each column has many breaks inside it, and you need something like predecessor search to select among the breaks. Can you clarify what the algorithm really does?

For the challenge, what pre-processing time are yo...

2011-09-16T15:06:53.526-04:00

For the challenge, what pre-processing time are you allowing us?

How much space am I allowed to use?

2011-09-16T15:01:21.067-04:00

How much space am I allowed to use?

That is a cute question. No fancy data structures...

2011-09-16T14:59:04.137-04:00

That is a cute question.

No fancy data structures are required using the solution from slides 41--48 of this PDF: http://www.eurandom.nl/events/workshops/2009/quantative_models/BusicSIMParf.pdf

The solution there, known as the Walker method or alias method, has O(n log n) preprocessing and O(1) query time (both worst-case). The preprocessing can easily be improved to O(n), which was done by Kronmal and Peterson or can be easily done in a couple minutes.