What makes it tough for some of the papers you cite is the view that shaving off log factors is often viewed as much less interesting than larger improvements.This, of course, makes my latin blood run even hotter, and I cannot help writing follow-up posts (this is the first). If only I could keep my promise of not writing about politics, my life would be so much simpler. (If only I could learn from history... I got to observe my father become a leading figure in Romanian Dermatology a decade before he could get a faculty position — mainly due to his latin blood. He got a faculty position well into his 50s, essentially going straight to department chair after the previous chair retired.)
So, let's talk about shaving off log factors (a long overdue topic on this blog). As one of my friends once said:
All this talk about shaving off log factors from complexity people, who aren't even capable of shaving on a log factor into those circuit lower bounds...There is something very deep in this quote. Complexity theorists have gone way too long without making progress on proving hardness, their raison d'être. During this time, drawing targets around the few accidental arrows that hit walls became the accepted methodology. For instance, this led to an obsession about the polynomial / non-polynomial difference, where at least we had an accepted conjecture and some techniques for proving something.
Complexity theory is not about polynomial versus non-polynomial running times. Complexity theory is about looking at computational problems and classifying then "structurally" by their hardness. There are beautiful structures in data structures:
- dictionaries take constant time, randomized. (But if we could prove that deterministically, dynamic dictionaries need superconstant time per operation, it would be a very powerful message about the power of randomness — one that computer scientists could understand better than "any randomized algorithm in time nc can be simulated deterministically in time n10c if E requires exponential size circuits.")
- predecessor search requires log-log time. The lower bound uses direct sum arguments for round elimination in communication complexity, a very "complexity topic." A large class of problems are equivalent to predecessor search, by reductions.
- the hardness of many problems is related to the structure of a binary hierarchy. These have bound of Θ(lg n) or Θ(lg n / lglg n) depending on interesting information-theoretic issues (roughly, can you sketch a subproblem with low entropy?). There are many nonobvious reductions between such problems.
- we have a less sharp understanding of problems above the logarithmic barrier, but knowledge is slowly developing. For instance, I have a conjecture about 3-player number-on-forehead games that would imply nΩ(1) for a large class of problems (reductions, again!). [This was in my Dagstuhl 2008 talk; I guess I should write it down at some point.]
- the last class of problems are the "really hard" ones: high-dimensional problems for which there is a sharp transition between "exponential space and really fast query time" and "linear space and really slow query time." Whether or not there are reductions among these is a question that has preoccupied people for quite a while (you need some gap amplification, a la PCP). Right now, we can only prove optimal bounds for decision trees (via communication complexity), and some weak connections to NP (if SAT requires strongly exponential time, partial match requires weakly exponential space).
Let's look at algorithms:
- Some problems take linear time (often in very non-obvious ways).
- Sorting seems to take super-linear time, and some problems seem to be as fast as sorting. My favorite example: undirected shortest paths takes linear time, but for directed graphs it seems you need sorting. Why?
- FFT seems to require Θ(n lg n) time. I cannot over-emphasize how powerful an interdisciplinary message it would be, if we could prove this. There are related problems: if you can beat the permutation bound in external memory, you can solve FFT in o(n lg n). The permutation bound in external memory is, to me, the most promissing attack to circuit lower bounds.
- some problems circle around the Θ(n sqrt(n)) bound, for reasons unclear. Examples: flow, shortest paths with negative lengths, min convolution with a mask. But we do have some reductions (bipartite matching is as hard as flow, bidirectionally).
- some problems circle around the n2 bound. Here we do have the beginning of a classification: 3SUM-hard problems. But there are many more things that we cannot classify: edit distance and many other dynamic programs, min convolution (signal processing people thought hard about it), etc.
- some problems have an n*sort(n) upper bound, and are shown to be X+Y-hard. Though the time distinction between n2 and n*sort(n) is tiny, the X+Y question is as tantalizing as they get.
- some problems can be solved in nω by fast matrix multiplication, while others seem to be stuck at n3 (all pairs shortest paths, given-weight triangle). But interestingly, this class is related to the n2 problems: if 3SUM needs quadratic time, given-weight triangle requires cubic time; and if min-convolution requires quadratic time, APSP requires cubic time.
- what can we say about all those dynamic programs that run in time n5 or something like that? To this party, TCS comes empty-handed.
- how about problems in super-polynomial sub-exponential running time? Ignoring this regime is why the misguided "polynomial / non-polynomial" distinction is often confused with the very meaningful "exponential hardness." There is much recent work here in fixed-parameter tractability. One can show, for instance, that k-clique requires nΩ(k) time, or that some problems require 2Ω(tree-width) time.
And what can we say about k-SUM and all the k-SUM-hard problems (computational geometry in k dimensions)? This is an important illustration of the "curse of dimensionality" in geometry. I can show that if 3SAT takes exponential time, k-SUM takes nΩ(k) time.
Finally, what can we say about PTAS running times? In my paper with Piotr and Alex, we showed that some geometric problems requires nΩ~(1/ε^2) running time. This has a powerful structural message: the best thing to do is to exhaustive search after a Johnson-Lindenstrauss projection. - inside exponential running time, there is the little-known work of [Impagliazzo-Paturi] showing, for instance, that sparse-3SAT is as hard as general 3SAT. Much more can be done here.
Lest we forget, I should add that we have no idea what the hard distributions might look like for these problems... Average case complexity cannot even talk about superpolynomial running times (a la hidden clique, noisy parity etc).
While we construct exotic objects based on additive combinatorics and analyze the bias of polynomials, we should not forget that we are engaging in a temporary exercise of drawing a target around an arrow — a great exploration strategy, as long as it doesn't make us forget where we wanted to shoot the arrow in the first place.
And while complexity theory is too impotent right now to say anything about log factors, it should not spend its time poking fun at more potent disciplines.
A lower bound on some algorithmic problem certainly implies a related lower bound on any data structure that can be used to solve the problem. The converse is, in general, not true.
ReplyDelete(Do you know of cases where the convserse is true? That is, where a data structure lower bound was used to prove a general lower bound on some algorithmic problem?)
It seems that lower bounds on algorithmic problems must a priori come from something more fundamental than data structures, since a data structure only provides a certain way of attacking a problem, whereas fully general lower bounds talk about *any* method of attacking a problem.
This is your best post, and perhaps the best post I read in any TCS blog.
ReplyDeleteI'm curious about your discussion of the log-log n lower bound of predecessor queries. This doesn't cover all data representations does it? My understanding is that by using direct addressing (i.e. a bitvector), it can be done in constant time as Select(Rank(x) -1) using Munro's tabling method. In 2002, Beame and Fich derived a theta(sqrt(log n / log-log n)) bound for "trees of higher degree" representations. However, I'm not clear how these relate to the communication complexity lower bound you refer to here.
ReplyDeleteThat is a nice overview of a good number of open problems in data structures and algorithms. Though the techniques and methodologies may be different, particularly in the realm of data structures I don't see such a schism between complexity and algorithms. Your own work in the area has as many lower bounds as algorithms and the two analyses often feed off each other.
ReplyDeleteIm general, don't think that we gain by setting complexity and algorithms in opposition to each other. The successes in the last 15 years or so in new polynomial-time approximation algorithms and in the hardness of approximation have come about in part because complexity and algorithms have traded ideas.
Also, I don't see any reason why algebraic algorithms should be any less algorithms than combinatorial or geometric ones. It is only an accident of history that such algorithms often get grouped with complexity (as in the journal Computational Complexity). If you exclude such algorithms you are missing out on some really first rate ideas - for example, my favorite recent data structure is the beautiful multivariate polynomial evaluation data structure due to Kedlaya and Umans (see section 5 of the paper.) It leads to a pretty great algorithmic improvement, too.
Even just the univariate data structure is awesome. For one, it says that the Carter-Wegman hash functions which we all have grown up with can be preprocessed for fast evaluation (question-to-self: actually, is there any point given Siegel above?).
ReplyDelete(Research ?) question: Can the dependence on q in Theorem 5.1 of http://www.cs.caltech.edu/~umans/papers/KU08-final.pdf be eliminated in word RAM (suppose w > log(q))? This would make the previous paragraph actually true for all GF(p).
I like this blog very much!
ReplyDeleteI will try to follow these good open problems you posted here :)
Mihai-
ReplyDeleteVery helpful list of problems/strata. But what do you mean by the X+Y problem? (Addition's in linear time, right?)
Andy: Given x_1,...,x_n and y_1,...,y_n, consider the array z of length n^2 with z_{i,j} = x_i + y_j. Question: can you sort z in o(n^2*log(n))?
ReplyDeleteAh, thanks!
ReplyDeleteAnon 1:
ReplyDeleteA lower bound on some algorithmic problem certainly implies a related lower bound on a data structure[..] The converse is, in general, not true.Yes, you are of course right. The reason I study data structures is that they are fundamental to computation (in my view, as fundamental as algorithms). Whenever friends from Google ask me questions, it's always some data structure they are looking for. Many things are naturally reactive systems, as opposed to pure computation problems (think data bases, routers, google.com, etc).
But let's say you only care about algorithms, though. Your options are pretty limited: either sit around and wait for better times, or try to prove things shy of your final goal. Without proving the algorithmic lower bound, the data structure lower bound is a very good compromise, because it is actually useful in its own right.
Plus, there is the Valiant argument: "a lower bound for this algorithm will need to explain, not merely imply, a lower bound for a data structure." What this means in practice: you're often searching for hard distributions for the data structure, which become candidate hard distributions for the algorithm. These distributions are amazingly useful when you have them (for many things on the list, I do not know conjectured hard distributions).
(Do you know of cases where the convserse is true? That is, where a data structure lower bound was used to prove a general lower bound on some algorithmic problem?)There is a bidirectional transformation between sorting and priority queues [Thorup, JACM]. Thus, if you ever prove a lower bound for priority queues, it will be a circuit lower bound for sorting.
I'm curious about your discussion of the log-log n lower bound of predecessor queries. This doesn't cover all data representations does it?
ReplyDeleteAll the discussion in my post refers to "the real model" (word RAM). The optimal lower bound for predecessor in in [Patrascu-Thorup STOC'06]. For instance, it improves over [Beame-Fich] that you cite. The sqrt{lg n / lglg n} bound is for very very large word size (see the complete bound in my paper). As a rule-of-thumb simplification, I think predecessor is log-logarithmic because the word size is usually poly(log n).
The Munro paper that you mention is an exercise in what flying pigs would imply. They assume your memory is organized like a tree, and in one shot your can read an entire root-to-leaf path. Needless to say, this is not the normal idea of computation.
Paul, thanks for the link to [Kedlaya-Umans]. I was not aware of the result.
ReplyDeleteI have nothing against algebraic algorithms; it's just that my knowledge of them is rather limited.
Anon:
Even just the univariate data structure is awesome. For one, it says that the Carter-Wegman hash functions which we all have grown up with can be preprocessed for fast evaluation
You have to be careful about what fast means. If you can hash in O(log n) time, it's not so interesting, since you can use binary search trees to deterministically assign a unique hash...
By the way, Siegel is tight (there is a lower bound), except for the dependence on the memory. The main problem with Siegel is making it practical.
The Munro paper that you mention is an exercise in what flying pigs would imply. They assume your memory is organized like a tree, and in one shot your can read an entire root-to-leaf path. Needless to say, this is not the normal idea of computation.You do know that this memory model has been implemented and sold commercially, right?
ReplyDeleteYou do know that this memory model has been implemented and sold commercially, right?
ReplyDeleteFirst of all, that idea does not scale (it needs one bit per member of the universe; a static data structure of that size is trivial even in regular memory).
But ignoring the details, I have two issues with it:
(1) It's also possible to make an efficient VLSI circuits for solving flow in planar graphs. Studying such circuits is not bogus, but also not the most exciting topic. Hardware today needs to be as universal as possible, so don't expect this VLSI circuit to be too influential in the world.
(2) From an intellectual standpoint, the programming model than millions and millions of people have in their brain is the word RAM. This is "hard-coded" into all programming languages today. This is what I'm studying. Special-purpose hardware is a different topic than a theory of programming.
"The Munro paper that you mention is an exercise in what flying pigs would imply..."
ReplyDeleteSo, you are arguing that Rank and Select on a bitvector cannot be done in constant time, which is fundamental to the analysis of all self-indexing methods (Such as developed by Navarro, Ferragina, Makinen, Sadakane, and many others)? I have to assume your real issue is that these methods make a bounded universe assumption, which is often perfectly reasonable.
I have to assume your real issue is that these methods make a bounded universe assumption, which is often perfectly reasonable.
ReplyDeleteThere are two separate issues: the special tree memory (see my previous comment), and predecessor for linear universes (which means assuming you have as much space as the universe).
There is nothing unrealistic about a linear universe. In some problems it is a true assumption, and in the lower bound you can see that time=1 when you set u = n*polylog(n).
On the other hand, in other problems, it is not true. The lower bound is Omega(lglg n) even for a universe of n^{1+eps}. It is very easy to get into a situation with a quadratic universe (think edges of a sparse graph etc).
The case of a quadratic universe is hugely important for range queries (as you go down in your hierarchy, the universe does not shrink, only your space — so predecessor with quadratic universe if often a bottleneck there).
Since we are talking about a mathematical quantity (the universe), I don't see how we can have an argument about it. It is what it is, depending on the problem where you apply predecessor.
"There are two separate issues: the special tree memory (see my previous comment), and ..."
ReplyDeleteI'm not sure what you mean about special tree memory. From Okanohara and Sadakane, "Practical entropy-compressed rank/select dictionary": "Many data structures have been proposed for Rank / Select dictionaries, most of which support the queries in constant time on word RAM [7, 13, 16, 19, 21] using n + o(n) bits or nH0(S)+o(n) bits (H0(S) ≤ 1 is the zero-th order empirical entropy of S)."
Either you believe Rank and Select can be done in constant time or you do not. If this is true, Predecessor queries can also be done in constant time. These methods are not space optimal, but that is a different question entirely. I'm not trying to start a flamewar, I'm just trying to reconcile your lower bound argument with my understanding.
I think that main confusion comes from my responding to 2 (or 3?) anonymous commenters believing they are the same. My apologies. (Maybe you sign with Anon1, Anon2 for identification...)
ReplyDeleteSo, predecessor can be solved in constant time on this special flying-pigs memory architecture: [Brodnik, Carlsson, Karlsson, Munro: SODA 2001]. As some Anonymous' says, this type of memory was even implemented.
On the word RAM (the standard model), the tight trade-off between query time / space / universe is understood (see my paper from STOC'06).
In the succinct papers that you cite, n is being used as the universe, not the number of elements (ones). I am not fond of this practice; they should use u.
If you do have space proportional to the size of the universe, predecessor search can be solved in constant time (it's not too hard).
These methods are not space optimal, but that is a different question entirely.
It is very much the same question. Any static problem can be solved in constant time if you have space proportional to the universe of queries: simply tabulate all answers.
What we look at are space-time trade-offs. For instance, range queries in constant dimension can be solved in O(1) time with poly(n) space. But the really interesting question is n*poly(lg n) space, and there are hundreds of papers working on the time bound in this case.
For predecessor search, the problems is hard with n*poly(lg n) space and n^{1+eps} universe. But of course, it is easy with n^{1+eps} space.
You have to be careful about what fast means. If you can hash in O(log n) time, it's not so interesting, since you can use binary search trees to deterministically assign a unique hash...But your goal might not be to perfectly hash -- you might just want a k-wise independent hash function. Anyway, this is why I wanted the dependence on q to be removed.
ReplyDeleteThe sqrt{lg n / lglg n} bound [for the predecessor problem] is for very very large word size (see the complete bound in my paper). As a rule-of-thumb simplification, I think predecessor is log-logarithmic because the word size is usually poly(log n)Mihai: This comment was incomplete and your answer to the questioner was a bit misleading. The point is that for poly(log n) word sizes, the log-logarithmic upper bound is even smaller than sqrt(log n/loglog n). However, [Beame-Fich] does yield even smaller upper bounds than you have stated as your lower bounds in the column (shaving off a logloglog factor, since the subject of shaving off log factors has come up) but the algorithm requires n^(1+epsilon) memory cells rather than quasilinear size memory.
ReplyDeleteBut your goal might not be to perfectly hash -- you might just want a k-wise independent hash function.
ReplyDeleteTrue, hashing is useful in many contexts. It's just that I don't know an application where O(lg n) independence is not enough (since the bounds decrease exponentially in the independence). For O(lg n) independence you get no benefit.
PS: It should be clear that I have nothing against the [KU] paper. I actually find it very cool, from my outsider's perspective.
This comment was incomplete and your answer to the questioner was a bit misleading
ReplyDeletePaul: Indeed, the answer was as incomplete as they get :) I am desperately trying to avoid discussing the whole predecessor trade-off, since it's too long and hard to grasp.
By a majority vote of the applications, I think that polynomial universe — near linear space is the most used regime (range queries, persistence...). So I'm trying to quote the bounds in that regime. But it really depends on the application... Do you think I should make my default quote in another range of parameters?
Now that things have settled down, I'll add my compliments.
ReplyDeleteThis was the best CS-related blog post I've read in quite a while.
people stop debating. Is Mihai's point not clear?
ReplyDeleteWhat he is doing is useful, what he is working on is significant, he is working on the meaningful problems on the right model. People should follow him. I can't agree him more!
Great post, but that anonymous commenter doesn't sound like a complexity theorist to me:
ReplyDeleteWhat makes it tough for some of the papers you cite is the view that shaving off log factors is often viewed as much less interesting than larger improvements. The standards required for judging papers that shave off log factors (or sub-logarithmic factors) seems different - if you are going to do this then, as Michael Mitzenmacher keeps suggesting, you ought to implement your new algorithm and show how much better it is in practice. I think that there is skepticism in general that improvements come at the expense of more complex algorithms that are worse in practice.Complexity theorists, especially people doing lower bounds, do not really look at polynomial vs. non-polynomial time. (Polynomial time is just a nice abstraction sometimes, just as Oh notation is often useful.) In particular many have spent much time trying to prove that the FFT (or any other explicit linear transformation) is rigid and hence cannot be computed by a linear circuit of O(n) size and O(log n) depth. Also many people were excited about Rossman's recent result that clique requires AC0 circuits of n^{Omega(k)} size.
A comment, not a critique: I think part of the reason complexity gets more attention than data structures is simply because there are no textbooks for the latter. Papadimitriou's book was widely influential (perhaps not always in a good way), and the new Arora-Barak and Goldreich books will be similarly so. Many (most?) universities do not offer graduate courses in data structures, and this is not likely to change until a good textbook becomes available. Even at the undergrad level, students see some complexity theory in their computability class, but do not see any hint of Advanced Data Structures in their data structures class.
ReplyDeletePS: If I am missing some great textbook(s) on the subject, my apologies in advance.
How is Overmars' book? (I haven't read it myself.) It's somewhat old though.
ReplyDeleteMihai, why don't you write a text book for data structures then? :)
ReplyDeleteIndeed, all books in data structures are outdated by at least 2 decades. In lower bounds (the other area I belong to), there are no books of any kind.
ReplyDeleteI might write a book at some point, but it's effectiveness may be limited. If anybody wanted to teach advanced data structures, they could just follow our lecture notes:
http://courses.csail.mit.edu/6.897/spring05/
The reason such a class is not taught at most places is the lack of faculty, not the lack of a book.