The [Splepian-Wolf'73] coding scheme is a famous result in coding theory, that inspired the field of distributed source coding. I guess this topic would be better for Michael's blog, but I am curious to ask some questions, and I want to describe a slightly related beast in the next post.
Review. You have k clients, and a server. The clients have some inputs x1, ..., xk, each of n bits, and they want to communicate all inputs to the server. Now assume these inputs come from some joint distribution with entropy much less than nk. Can one use similarly little communication?
If the distribution is rectangular, i.e. the xi's are independent, each client i can perform source coding (say, Huffman coding) based on his own distribution, achieving H(xi) communication. With dependencies, if clients broadcast their messages in order, each client can perform source coding based on his distribution conditioned on the previous messages. This achieves optimal total communication, since Sumi H(xi | x1, ..., xi-1) = H(x1, ..., xk).
Slepian and Wolf showed that it is possible to achieve optimal communication even if all clients communicate one message simmultaneously, and the messages are only seen by the server.
In fact, there is a whole simplex of possible scenarios for optimal communication. Consider k=2 to make notation easy. Then, it is possible for client 1 to send m1 bits, and client to to send m2 bits as long as:
- m1+m2 ≥ H(x1, x2)
- m1 ≥ H(x1 | x2)
- m2 ≥ H(x2 | x1)
Since then, people have developed efficently encodable and decodable distributed codes for all sorts of important special cases. A "special case" means what makes the inputs correlated (they are close in what metric?). To give realistic examples, the inputs may be close in Hamming distance, or in edit distance, or they may be numbers that are close in value (say, two noisy observations of the same event).
Questions. I never actually worked on this (but on something related, see next post), so I only have some meta questions. The field is usually sold as being important in sensor networks. Is this "for real", or just theoretical PR? People interested in sensor networks on our side of theory (eg the streaming folks) do not seem too aware or interested in this field. Why?