Sunday, October 28, 2007

Rational Search

We all know the number guessing game. I think of an integer between 1 and n, and you try to guess it by comparison queries, "is it larger than x?". Binary search takes ceiling(log2 n) queries, which is of course optimal.

But what if I think of a rational number a/b , with 0<a<b<n? Can you still guess it with a few comparison queries? Take a few seconds to show that you can guess it with 2log2 n + O(1) queries, and that's optimal. (It's a 1-line proof.)

x
x
x
x
x
x
This space intentionally left blank.
x
x
x
x
x
x


Two fractions with denominators up to n cannot be closer than 1/n2. Thus, you binary search for k such that the number is in the interval [k/n2 , (k+1)/n2). This interval contains a single fraction, so you can just go through all fractions and find the one inside. On the lower-bound side, there are on the order of n2 fractions, so it takes at least 2log2 n - O(1) queries with binary output.

The moral of the story is, of course: be careful about the difference between actual running time and query complexity. But now let us show that you can actually find the unique fraction in the small interval with running time O(lg n). There are several papers doing this, including:

  • Christos Papadimitriou, Efficient search for rationals, IPL 1979.
  • Steven P. Reiss, Efficient search for rationals, IPL 1979
  • Stephen Kwek and Kurt Mehlhorn, Optimal Search for Rationals, IPL 2003.
Here, I'm going to describe a nice solution based on Farey fractions. The Farey sequence of order n is the sorted list of irreducible fractions a/b, with abn. For example, for n=4, we have 0/1, 1/4, 1/3, 1/2, 2/3 3/4, 1/1.

The really cool property of irreducible fractions is that if a/b and c/d are consecutive in some Farey sequence, then the "mediant" (a+b)/(c+d) is the first fraction that appears between them in a higher-order sequence. For example, the first fraction to appear between 1/2 and 2/3 will be 3/5.

With this property, we can construct the Stern-Brocot tree, which is an infinite tree exploring all fractions less than 1. The following picture from Wikipedia should explain everything:

The intuitive algorithm is to go down this tree, looking for our fraction. Unfortunately, fractions with small denominators can appear very deep in the tree -- for example, 1/n is at depth n-1. Thus, we get O(n) time.

To improve this, we first show that the number of "turns" of the exploration is at most O(lg n). By turn, I mean switching from a going down on left children, to going down on right children (note that the 1/n example above had zero turns). Looking at the tree, it is easy to see that a turn essentially doubles denominators, so the property is easy to show.

Now we have to explore the straight descents efficiently. Let's say the fraction falls between a/b and c/d, and we take t steps to the left. This will land us at precisely (ta+b)/(tc+d). We can solve this equation for t, and find the minimum value which makes the fraction less than the target interval. Thus, we can calculate the number of consecutive left steps in constant time.

If we're too lazy to solve the equation, we can binary-search for t. Here we use galopping binary search, i.e. we take time O(lg t) to guess the unknown t.

Surprisingly, this will not give O(lg2n) time in total, but still takes O(lg n) time. This follows because the sum of the lg t values traces the number of times the denominator doubles, and it can only double O(lg n) times.

Overall, we have query complexity, Farey sequences, doubling arguments, galopping binary search, and summing up logarithms. This probably makes very nice material for an undergraduate "Math for CS" course.

(As a person who has never taken, nor taught such a course, I of course feel entitled to have opinions about it.)

Thursday, October 11, 2007

Range Queries (I)

I am now in Bonn, spending some time talking to Yakov Nekritch about range queries. Since this is a field I care about a lot, I will write a series of posts on it.

First, let me begin with my spiel on the importance of studying range queries. This is taken from the beginning of my talk at UW in May.

Once upon a time, you go to a bar in NY and you're trying to pick up a nice woman. She tells you she works in HR, and you tell her you are a theoretical computer scientist. She confesses she has not idea what that means, at which point you clarify "Oh, I am trying to understand the fundamental nature of computation". The blank stare you get is a clear indication that some more clarification is needed.

What do you then explain at this crucial moment of the night? Though I have not tried the following options myself, the prevailing opinion is that they are ill-advised:

  • I am trying to prove lower bounds for constant depth circuits with mod-6 gates.
  • I am trying to extract pure randomness from 2 independent sources with small min entropy.
  • I am trying to design codes of subexponential size with efficient local decodability.
  • I am trying to show that a diagonalization argument cannot separate P and NP, even if it is given a low degree extension of an oracle.
One can complain that too few people understand what computing and computation actually mean. That is a valid complaint, and we should indeed work with precollege education to change things.

But here you're in luck. The gorgeous HR woman actually uses computers every day, and she knows exactly what they're supposed to do. Chances are, she even has some exposure to a programming language! That is of course SQL, which is very useful for her work. The only trouble is -- for her, and for most of the planet, computers are a great tool for data storage and manipulation, not a great tool for finding Hamiltonian paths.

So what do you tell her? Anybody who has attended 5 minutes of a SQL tutorial has seen examples like:
SELECT * FROM employees
WHERE salary <= 70000 AND startdate <= 1998
In fact, she probably runs code like this on a daily basis. Thus, she already understands orthogonal range queries -- formally, you are given a set of points, and you ask for points in some rectangle.

You can now explain that your work is about how fast such queries can be executed, and, with her curiosity satisfied, move to more charming topics.


PS: This post is not about bad-mouthing complexity theory. For a long time, I have considered myself a complexity theorist, and I am interested in many topics in the field, well beyond my own research. The funny references are to some very particular young-and-eager complexity theorists, who have occasionally exasperated me with comments like "Yeap, data structures can be interesting sometimes, but really our goal is to understand computation". The point, if it was not clear enough, is that data structures are certainly not more artificial as our other computational questions -- maybe most computational questions that computer users care about are data-structure (database) questions.

Friday, October 5, 2007

Schedule

For the past two weeks, I have been enjoying the unique atmosphere of Romania. I promise to blog a bit about Romania and its universities next week, since most people are probably not aware of this parallel academic universe.

Now it is time to get back into the real world, and possibly do some work. Of course, on the way to doing work, I might as well have some fun. My upcoming schedule is as follows:

  • Oct 6: Timisoara (my first visit to this reportedly beautiful city in West Romania)
  • Oct 7: Budapest
  • Oct 8: Bratislava (first time in Slovakia)
  • Oct 9: Amsteram
  • Oct 10 -- Oct 16: Bonn
    I am visiting Yakov Nekritch, whose homepage picture reminds me of pre-1989 Romania (a mid-level Party administrator for agriculture, looking proud of his crops).

    On Monday, Oct 15, at 10am, I am giving the following talk:

    Dynamic Graph Algorithms invade Geometry
    After dynamic graph algorithms has been an area of intense research in data structures for almost 30 years, the time seems ripe for the field to tackle problems of a geometric nature. In dynamic graph algorithms, we have a graph that is being changed through updates (typically edge insertions and deletions), and we want to query it for various graph properties (say, whether two nodes are connected, the cost of an MST, the shortest path between two nodes etc).

    Now consider a scenario in which we have sensors distributed in the space, and each sensor can only talk to another sensor at radius r. The connectivity is defined by the intersection graph of discs of radius r/2 around every point. Can we handle a dynamic environment, e.g. a situation when new sensors can be inserted, or some sensors run
    out of battery?

    Our results apply to the intersection graphs of a very general class of objects (includings disks, segments, rectangles) and support the dynamic connectivity problem in time O(nc) per update and query, where c is a constant less than 1 depending on the geometric shape in question. This opens a very broad and promising research direction:
    • can new graph queries be supported (e.g. shortest path queries)?
    • what is the best running time for each important geometric shape?

    Joint work with Timothy Chan and Liam Roditty.

  • Oct 16 -- Oct 20: Brussels (first time in Belgium)

    Here, I am visiting Stefan Langerman. On Thrusday, Oct 18 at 12:30, I am giving a talk:

    Farey Sequences and Counting Primitive Lattice Points
    A primitive lattice point is a point (x,y) with x, y integers and gcd(x,y)=1. "Counting" primitive lattice points inside various planar shapes has been an active area of mathematics in the past decades. But by virtue of seeking an elementary formula, the mathematical definition of "counting" is "approximating asymptotically". We now present the first results on the informatics view of this problem: give an efficient algorithm to count primitive lattice points exactly.

    The problem is related to another entertaining mathematical problem that has crossed the border into informatics. The Farey sequence of order N is the sorted sequence of irreducible fractions a/b, with abN. There are well-known algorithms for generating this sequence in its entirety. We now describe algorithms for finding just one value of the sequence (say the K-th value) significantly faster.

    Joint work with Jakub Pawlewicz.

  • Oct 20 -- Oct 23: FOCS in Providence.
    The talk is on Monday morning at 8:30, and I am giving it (Mikkel and I use the alternation principle for choosing the speaker). The paper is here.

    Planning for Fast Connectivity Updates
    Understanding how a single edge deletion can affect the connectivity of a graph amounts to finding the graph bridges. But when faced with d>1 deletions, can we establish as easily how the connectivity changes? When planning for an emergency, we want to understand the structure of our network ahead of time, and respond swiftly when an emergency actually happens.

    We describe a linear-space representation of graphs which enables us to determine how a batch of edge updates can impact the graph. Given a set of d edge updates, in time O(d.polylgn) we can obtain the number of connected components, the size of each component, and a fast oracle for answering connectivity queries in the updated graph. The initial representation is polynomial-time constructible.
I will blog about each paper after the talk. If you want to meet in Brussels, Bonn, or Providence, drop me a line.

Tuesday, October 2, 2007

CRA Award

In 2005, I received CRA's Outstanding Undergraduate Award (my citation, with a nicely kempt picture).

I have been asked a bunch of times (most recently yesterday) for "insider" tips, my nomination materials, opinions etc. So why not post all of these publicly on the blog, to promote competition...

Here is the research statement and the CV that I submitted back then (2004). It's nice to see the CV has changed a bit since then, not only in the contact info section ;)

Below is some "advice" and personal opinions that I wrote in emails. But there is a catch that you must be aware of -- I certainly did not have a typical application. I had a bunch of research even back then, and I got the award based on work in my sophomore year, which is very very atypical. Thus, you can either think that (1) the application was good enough to earn me the award so early, so my advice is trustworthy; or (2) my circumstances were sufficiently weird that the application didn't really matter, so my advice is worthless.

For what it's worth it, here was the formula I used:

  • I got the advice (maybe it was even official advice from CRA) to describe carefully what my contribution was on the papers, so I do that to some extent in the statement. At a later stage in life (post PhD), this would probably be unusual.

  • I was also told that some contribution to humanity might not hurt, so the CV emphasized my involvement in organizing high school computer olympiads. Still I decided not to dramatize it too much.

  • I did not mention intended future work, broader impact, and the like. My reasoning was that empty talk is a skill that senior researchers need to develop for grants. It is understood that empty talk is needed, so including it will not hurt your reputation.

    But if young students write the kind of things we normally write in grants, they end up looking sily and childish. They should compete based on objective things they actually did, and get some respect based on objective criteria before the hype can begin.
Looking at the research statement in retrospect, I severely underestimated the impact of my predecessor lower bound with Mikkel (then in the writing). That was a fundamental paper --- the first separation between quasilinear space (say, n polylg n) and polynomial space (say n2). In data structures, all the action tends to happen below quadratic space, so this paper has already generated a lot of fall-out, and will probably be remembered into the future.

Still, the fact that I did not speculate on future impact was probably the right thing to do. As I said already, I believe the tone of a statement at the student level should be strongly objective.

Sunday, September 30, 2007

China (III)

I do not write about politics, since I could never be as good as some other people who do. A link to fellow theorist Bernard Chazelle, and the captivating essays on his webpage, is in order. (I hesitated posting a link not because I disagree with some of his arguments, which indeed I do, but because you will probably get distracted and not read the rest of my post... But render unto Caesar the things which are Caesar’s, I already said there are many people who write better than me.)

So, I did go to China, and I am back, and now I feel compelled to write a few words.

My friends in Romania ask me how the Chinese can stand their government, and why they are not fighting. Of course, the obvious guess on everybody's lips is panem et circenses -- the Chinese economy has been making strides forward, which allowed the government to keep citizens happy.


But another reason that cannot be ignored lies in the remarkable efficiency of Chinese propaganda among its citizens. A prime accomplishment of the communist machine is the fact that political discussion is so touchy among Chinese inteligentia in the US. The party has managed to convince even truly brilliant people that the state is the party and the party is the state. Then, if somebody challenges Chinese communism, they are attacking the Chinese patriotic sentiment. Love of the Vaterland becomes a shield of the regime.

Indeed, they are careful and proactive about propaganda. In the new age of communication, they have set up the world's best known firewall to keep the citizens away from subversive information. A non-technical person will not be able to get to Wikipedia, or to blogspot, or to any "unofficial" description of Tiananmen 1989, and will not be able to use encryption (ssh).

In fact, brainwashing and censorship were such a durable and effective efforts, that it became hard to get censors! A censor recently let a reference to Tiannmen be published because she thought June 4, 1989 was a mining disaster [Reuters].

Tiananmen has become the place where "On this spot in 1989, nothing happened".


Yet despite the brainwashing, there are those who are fighting (and currently losing). Taiwan's situation is constantly becoming worse, since every dollar that the booming Chinese economy earns is another dollar behind the economic blackmail in the One China policy. With Burkina Faso as the largest country to officially recognize Taiwan, their backing sadly reminds us of the grand Coalition of the Willing.

Inside the People's Republic, a cultural fight is led. In good Soviet tradition, the Party's policy was: grab as much land and people as possible, and rebrand everything as Chinese. (Han Chinese, of course.)

Tibet and Xinjiang are just about as Chinese as Finland. But in the past half-century, the Party has managed to bring 7 million or so Han into Xinjiang, a land populated by Turkic people (Uyghur and Kazakhs). This raised the percentage of Han from zeroish to 40%, and will soon make Xinjiang "Chinese", a feat that would have made the Father of Nations, Joseph Stalin, exceedingly proud.

As for Tibet, it may not even be needed to resort to ethnic invasion. Here is the simple story. The Dalai Lama and the Panchen Lama assure a continuity in Tibetan spiritual leadership. When one dies, the other searches for a reincarnation, and "takes care of business," while the young boy found to be the reincarnation grows up, and is educated to be a great Lama. When Panchen Lama died, the Dalai Lama (from exile in India) selected a new Panchen Lama in communication with a search commitee wandering across Tibet.

In 1995, the young boy became a political prisoner at the unlikely age of 6, and he has not been seen since. In its infinite wisdom and understanding of Tibetan culture, the Party chose the next Panchen Lama, and took the boy to Beijing. From there, he has been making periodic statements about the great freedom of religion enjoyed by the Chinese people living in Tibet.

Now, when the Dalai Lama dies, can anyone guess on what criteria the Panchen Lama (Beijing version) will select a reincarnation?

In the mean time, any attempt to cross into India and communicate with the Dalai Lama is a treason to the Chinese state. Here is a ghastly piece of footage caught by a Romanian cameraman while climbing in the Himalayas (which pedictably won some accolades at the Emmy Awards this year).

It shows a Chinese border patrol executing Tibetans who are trying to cross the mountain. Via the New York Times, we learn that China acknowledged the incident, and justified it as self-defense after the monks tried to attack the soldiers.




Thursday, September 20, 2007

Slepian, Wolf and relatives (III)

To wrap up this thread of posts, let me discuss some motivation and directions for asymmetric communication channels, described in post II.

In my talk in China, I mentioned the following naive motivation. We have two sensors in a river. Given the average velocity of the flow, we can deduce that the readings of the sensor upstream at some time t, will be correlated with the readings of the sensor downstream at some time tt. The correlation will not be perfect (otherwise we wouldn't use two sensors), but it is enough to compress the message. However, the sensor downstream doesn't know the distribution of his own input, since he doesn't know the other sensor's reading at an earlier time.

This example is simple and intuitive enough that it should work as PR. Here are the more serious motivations that get thrown around:

  1. sensor networks (in some abstract way)
  2. internet connections (most ISPs give you faster download than upload; downloading from a satellite is significantly easier than uploading)
  3. stenography and combating consorship. This comes from the paper: [Feamster, Balazinska, Harfst, Balakrishnan, Karger: Infranet: Circumventing Web Censorship and Surveillance. USENIX Security Symposium 2002]
About 1., an Anonymous commenter on my earlier post believes that the academic work on sensor networks is mostly PR. Piotr explains that the problem is that you want to do some source processing, as opposed to just sending some large sample, and it's hard to understand any distribution after source processing. This is a very valid concern for both the Slepian-Wolf model, and the asymmetric communication model.

Moving on to 2., I have not seen examples where this approach might be useful at the scale of realistic ISP links, nor examples of how you might get the server to know the distribution better than you.

About 3., I have a feeling that it is simply accidental cobranding. To get your paper accepted to a practical conference, it helps to use some fancy theoretical work somewhere (maybe a place where you don't even need it), and theoretical work can claim practical importance if somebody uses it in a supposedly practical conference. I am reminded of a funny quote: "I would gladly work on the practical side of academia, but so little of practice actually applies to practice..."

Thus, unfortunately, I do not find the motivations I heard so compelling (outside of theoretical beauty with which I will fully agree).

What is probably needed is more attention towards the EE end of the discipline. Is anybody aware of any EE-work on this problem? In our abstract setting (Bob knows a distribution), any algorithm is ultimately impractical, because nobody works with perfect knowledge of a distribution (a distribution is a big object!). What we need is better modelling of what "understanding a distribution" means, and algorithms based on that. To give the prototypical sily example, the distribution may be (roughly) normal, so Bob only needs to know the mean and standard deviation.

On the lower bound side, I have some rough ideas to prove that the lower bound holds even if the distribution is the trajectory of a Markov chain of polynomal size (as opposed to the distribution in our paper, which really took exponentially many bits to describe).

Wednesday, September 19, 2007

Slepian, Wolf and relatives (II)

I didn't really get the answers I was hoping for in my original post on Slepian-Wolf coding, but let's move to the second part.

This post is about "asymmetric communication channels". Assume Alice receives an input x in {0,1}n, and she wants to communicate it to Bob. If x comes from some distribution D of less with entropy less than n, Alice and Bob should agree to communicate using a Huffman code and send ≤ H(D)+1 bits.

But what if D is only known to Bob, and not to Alice? Can Bob send something intelligent, which is still much less than the 2n description of the distribution, but which allows Alice to communicate efficiently? If you want to hear motivations and applications of this setup, I will discuss them in my next post. Personally, I find this "can you communicate a Huffman code" question sufficiently natural and intriguing.

It is not hard to show that the total communication needs to be at least n. Thus, a nontrivial protocol is only possible in an asymmetric setting, where Bob is allowed to communicate more than Alice.

Upper bounds. The problem was first studied by Adler and Maggs [FOCS'98], who gave a protocol in which Bob sends expected O(n) bits, and Alice sends expected O(H) bits. This is a surprising result: Bob is sending some sketch of the distribution that is logarithmic in its description, but with this sketch Alice can compress her input down to entropy.

There is one catch, though: the protocol uses O(1) rounds in expectation. It does not provide a guarantee for any fixed number of rounds.

With this catch, you can start to get a glimpse of how the protocol might run. Up to constant factors, a distribution with entropy H looks like a distribution putting probability 2-(H+1) on some 2H samples (mass 1/2), then probability 2-(H+2) on other 2H samples (total mass 1/4), probability 2-(H+3) on other 2H samples (mass 1/8) and so on. Bob is going to send a hash function {0,1}n -> {0,1}H which is injective on the 2H samples on the first level, and Alice returns the hash code. Bob sends the sample that maps to that hash code, and if Alice says he was wrong, they continue to the second level (which will only happen with probability 1/2). This continues, reaching level i (and round i) with probability 2-O(i).

There has been a lot of follow-up work, which tries to reduce the constants in the communication. Some papers have Alice communicating only H+2 bits, for instance. However, I do not like these results because they use roughly H rounds in expectation. In real life, a round of communication is much much more expensive than sending a few extra bits.

In SODA'05, Adler extended the protocol to the case where there are k Alice's, which have samples x1, ..., xk to communicate. These samples come from a joint distribution, known only to Bob.

Lower bounds. Micah Adler came to a problem session organized by Erik, and told us about the problem, highlighting the weird behavior that the number of rounds is geometric. Personally, I was fascinated by the question of whether this was tight.

Unfortunately, the proof eventually required not only some nice ideas, but also a lot of work in balancing everything just right. I spent a few months paying long daily visits to Nick's office, before we finally managed to make it go through. In the slides for my talk in China, I associated this problem with a bas-relief depicting galley slaves.

The proof uses round elimination and message switching (which I will talk about in future posts), but in an unusual context: you need to keep recursing in these smaller and smaller probability spaces.

Eventually, we showed that i rounds are needed with probability 2-O(i lg i), which is fairly close to optimal, and demonstrates that any fixed number of rounds is not enough. In the good tradition of sharp phase transitions, the lower bound holds even if Bob can communicate an outrageous amount: 2^{n1-ε}.

Our paper appeared in SODA'06. The paper and slides have fairly good intuition for how the bound works.