Saturday, January 31, 2009

Morse meets Hamming

A while ago (2003), I proposed the following problem in the Romanian national olympiad:

You are given an alphabet of n letters, and the frequency p1, ..., pn with which each letter is transmitted. The goal is to find an optimal prefix-free binary encoding for the alphabet (as in Huffman coding). However, the code will be used for transmission over a telegraph, so the cost of a "zero" and "one" are different: a zero (dot) takes 1 second to transmit, and a one (dash) takes 2 seconds. Find an optimal code according to these costs and the given frequencies.
If you are an algorithmist, you will probably have fun thinking about the problem. What is the best running time that you can come up with? (I believe the best known is O(n2); getting any polynomial is already non-trivial.)

Generalizing this "telegraph problem" to arbitrary letter costs gives an interesting, SODA-esque research question that has been investigated quite a bit, but remains largely open. Say that a "zero" has cost α and a "one" has cost β (say β>α). The best known running times are nβ+O(1), but the problem is not known to be NP-hard for large β. Can you either show hardness or improve the upper bound?

If you are interested in previous work (including approximation algorithms), see Mordecai Golin's webpage, and follow references.

Tuesday, January 20, 2009

Talk at Berkeley

A long long time ago, when we took our first computers course, we learned that computers represent things in bits, not in base 10, since bits are easier to store (say, charged vs uncharged capacitor). Well, bits may be easier for computers, but apparently not much easier...

We can represent an array A[1..n] of digits using ceiling(n log2 10) bits on a regular computer, such that reading or writing any A[i] value can be done in constant time.


Come hear the details at my talk tommorow (Wednesday, 12pm, Wozniak lounge, Soda Hall, Berkeley). The write-up is forthcoming, and will be posted here.