Tuesday, July 31, 2007

Bijective Combinatorics

I was remarking previously that this blog is meant to be somewhat similar to Terry Tao's blog, but without the associated Fields medal. Maybe a more important difference however, is that while Terry Tao is (still) doing old Math, it seems I am working on the new Math. And what better plot twist could reinforce our community's claim, than to have algorithms become not only the language of science, but the language of Math itself. This post is about an area of Mathematics where this may end up happening soon.

Bijective combinatorics is the area of combinatorics where algorithm is king. To count some set S, we find another set T of known size, and establish an explicit bijection f between them. "Establish" and "explicit" mean we have to give an algorithm for f and one for f -1.

The field owes a lot to Richard Stanley, another member of La Familia. (No, we are not a Mafioso group trying to gain control of science, and constantly promoting each other's work. Talk to my lawyer.) The central assertion is that the algorithm is the most beautiful way to understand combinatorial counting -- a normal step is Mathematics' long quest for beauty by one standard or another.

Here are two examples of nice puzzles in the area. If you are only interested in "serious business" skip to the section on the role of CS theory.

Example 1: Prüffer codes. To get a flavor for the technique, let us count the number of trees that can be formed with n labeled nodes (i.e. nodes have personality, 1-2-3 is a different tree from 1-3-2, but the same as 3-2-1). We encode such a tree using an array A[1..n-2] of labels as follows:

for i=1 to n-2:
L = min {label(v) | v is a leaf}
A[i] = label(unique neighbor of node labeled L)
remove the node labeled L from the tree
Exercise (quite fun): Now give an algorithm to reconstruct the tree uniquely from the encoding (an array of n-2 labels). Bonus points if the algorithm runs in time O(n).

This shows there are precisely nn-2 labeled trees. Like many nice puzzles I know, I first saw this as a problem in the Informatics Olympiad, around 9th grade.

Example 2: Balanced parentheses. The number of strings containing n pairs of balanced parentheses, e.g. (()(())) for n=4, is {2n \choose n} / (n+1). This is known as Catalan's number.

Exercise (not so easy, but rewarding): Show an algorithm that encodes
a set of n elements from [2n] into a pair of the form: a number in {0,..n} and a string of n balanced parentheses. Then show a decoding.

Where we come in. All nice and fun. But since algorithms are the words of this corner of science, our challenge is to create metaphors and poems. As it turns out, combinatorialists have some problems they haven't figured out how to solve despite a lot of effort. Some of the most famous ones are about partitions, an area where bijective combinatorics is particularly strong. If you want to dig further, see a survey by fellow Mafioso Igor Pak.

Maybe, just maybe, the reason they can't solve the problems is that no efficient bijection exists. And that would be something great for us to prove, and bring complexity theory to the spotlight.

An immediate question is what "efficient" means. People versed in algorithmic counting (I promise to blog about this another time) will immediately see that we can find polynomial time algorithms. Thus, polynomial time is not the answer here. Maybe the mathematical notion of explicit is more like log-space, or small depth circuits, or whatever.

I worked on the problem briefly a few years back, and talked to Igor Pak about it. He didn't have an ultimate definition for what we want, as common bijections turn out to be in many small complexity classes. But the consensus arose that the initial step is to prove some impossibility, e.g. for small space or small depth, and resolve the deep philosphical question later. (There is always ample time for that, given the sloooow progress that complexity theory tends to make.) Alas I was not successful in proving anything.

Miklós Ajtai. And that was all I heard about the problem, until lunch today, when Mikki Ajtai told me he had been working on it, and thinks he has something. Specifically, the statement seems to be of the form: there are two languages recognizable by AC0 such that there is no AC0 bijection between them.

Judging from previous examples (like the first predecessor lower bound, parity not in AC0, the superlinear lower bound for branching programs, AKS sorting networks, lattice-based encryption, ...) one can hope/expect a result that will:

  • be painfully incomprehensible
  • preocupy us for at least a decade, until it is distilled into a few brilliant ideas
  • forever enter our collective reasoning process and our understanding of CS
  • earn a permanent entry in Encyclopedia Informatica


rgrig said...

I seriously consider stopping reading this blog after I spent 8h to solve Exercise 2 :) It makes me feel stupid. :)

Very brief, decoding is like this: Take the parentheses string and the number k. Identify the kth pair in the string. Swap ( and ) in that pair and in all pairs in which the kth pair is nested. then write ( as 0 and ) as 1. For example if k=2 and the string is (())() then you change to ))((() which means the subset 110001. Encoding is kind of similar.

I don't have a proof but I'll stop for today. It's late.

Anonymous said...

The Catalan number is for n pairs of parentheses which are correctly matched.

MiP said...

The Catalan number is for n pairs of parentheses

Oops, fixed. Thanks!

MiP said...

@rgrig: Actually, I found the problem quite difficult also. I cannot remember how much it took to solve it. In any case, no need to ruin your life ;)

grad student at waterloo said...

Here's a solution that's easier to visualize:

consider an n by n grid, theres 2n choose n paths from bottom left to top right, the encoding of all such paths correspond to all n-subsets of [2n]. For any given path p, decompose it into catalan paths. The encoded integer is #of times p crosses the line y=x. Then each catalan path can be encoded as a balanced parenthesis, you simply concatenate them all into one piece.

Nilay Vaish said...

I have an O(n log n) solution for tree decoding problem.
1. Read the initial array to find out the degree of each node (no of occurrences + 1). Create a heap with all nodes having degree 1 (leaves) ordered as per their labels.
2. Now iterate over the array from beginning. At ith step,
a. create an edge between the heap minimum and the array[i]. Delete the minimum.
b. Decrement the degree of the node with label array[i]. If this reduces the degree to 1, add it to the heap.
3. Finally add an edge between two nodes remaining in the heap at the end.

Can you outline the O(n) solution?

MiP said...

To make your algorithm O(n), just use a linear structure (queue?) instead of a heap. Store an array D[1..n], where D[i] contains a list of nodes with degree i. You hold a pointer to the minimum nonempty D[i], and just increment it through this array.