Saturday, January 31, 2009

Morse meets Hamming

A while ago (2003), I proposed the following problem in the Romanian national olympiad:

You are given an alphabet of n letters, and the frequency p1, ..., pn with which each letter is transmitted. The goal is to find an optimal prefix-free binary encoding for the alphabet (as in Huffman coding). However, the code will be used for transmission over a telegraph, so the cost of a "zero" and "one" are different: a zero (dot) takes 1 second to transmit, and a one (dash) takes 2 seconds. Find an optimal code according to these costs and the given frequencies.
If you are an algorithmist, you will probably have fun thinking about the problem. What is the best running time that you can come up with? (I believe the best known is O(n2); getting any polynomial is already non-trivial.)

Generalizing this "telegraph problem" to arbitrary letter costs gives an interesting, SODA-esque research question that has been investigated quite a bit, but remains largely open. Say that a "zero" has cost α and a "one" has cost β (say β>α). The best known running times are nβ+O(1), but the problem is not known to be NP-hard for large β. Can you either show hardness or improve the upper bound?

If you are interested in previous work (including approximation algorithms), see Mordecai Golin's webpage, and follow references.

Anonymous said...

I was working on this with Mordecai for a while. We were trying to prove it's NP-hard. We have another problem P' such that if P' is NP-hard, then the problem you showed in the post is probably NP-hard, and P' is much simpler and it should be reasonable to prove it's NP-hard (we didn't manage to). I won't post here what P' is, but anyone reading this should be free to email me and I'll tell them what the problem is.

Chandra said...

What is known if we settle for a code that is within say (1+\eps) of the optimal code?

thanks

Anonymous said...

this is the easy case of this problem:

http://projecteuler.net/index.php?section=problems&id=219

Mihai said...

ilyaraz, the problem you mention assumes the probabilities are uniform, so it is quite different.

Chandra, a PTAS is known:
http://www.cse.ust.hk/~golin/pubs/LOP_PTAS_STOC.pdf

Anonymous said...

You can see, that I've mentioned, that it is easy case. :)