Monday, February 4, 2008

Introducing TCS?

What do you present in a 2-day course with an audience who is:
  • really smart
  • generally unaware of TCS, but willing to spend time listening to you
  • half (non-theory) computer scientists and half hard-core theoretical mathematicians?
With a few hours to decide and 2 mornings to prepare, I designed the course detailed below. I hope this contained enough brain-teasers, cool ideas, and real applications to be an effective introduction to TCS. But I am very curious to hear about some "must-teach" topics that I forgot --- leave a comment.

----- Day 1 -----

Nearest Neighbor in the Plane.
Voronoi diagrams, point location with O(n) space and O(lg n) time via persistent data structures.
  • historical references: the autoritative paper is [Driscoll, Sarnak, Sleator, Tarjan: Making Data Structures Persistent. JCSS 1989; STOC 1986]. The initial papers were [Sarnak, Tarjan: Planar Point Location Using Persistent Search Trees. CACM 1986]; [Cole: Searching and Storing Similar Lists. JALG 1986]
  • today these results are described in many places. See for example these lecture notes from David Karger's Advanced Algorithms.
Going below logarithmic time. Talk briefly about the field of TCS exploring sublogarithmic running times, which is also one of the few areas where we have good lower bounds. I only taught van Emde Boas as an example, but mentioned a lot of things:
  • my paper with Timothy Chan on sublogarithmic point location. Open problem: lower bound.
  • my paper with Mikkel Thorup giving optimal lower bounds for predecessor search. (In particular, van Emde Boas is optimal for near-linear space.)
  • my favorite algorithm for sorting in o(nlg n) time is signature sort, which sorts in O(n) time when the word is w = Ω(lg2+ε n). See for example these lecture notes from our Advanced Data Structures course.
  • if my presentation of van Emde Boas managed to confuse you, check out one of the many expositions, such as these lecture notes from our course.
Nearest Neighbor in Constant Dimension. Curse of dimensionality: Voronoi diagram grows like nO(d). Space-filling curves: how to get (1+ε)-approximation in any constant dimension, with linear space and predecessor query time.
  • this is mostly from [Chan: Closest-point problems simplified on the RAM. SODA 2002]. Paper here.
  • I blogged about this a while ago.
High dimensions. Brief motivation of high dimensions (machine learning "features" and such). Stories and intuition about Lp norms; how God seems to love L2. Johnson-Lindenstraus, with proof. Brief story about approximate nearest neighbor (LSH and n1/ε^2); lower bounds.
  • there are many expositions of JL. For some very complete proofs, see these lecture notes from Michel Goemans' class.
  • the latest LSH paper, with a good bibliography, is [Andoni, Indyk: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, CACM 2008]. You can see it on the new and nifty CACM webpage (click the Contents button, then it's on page 117).
  • our lower bounds on JL-based algorithms.
Norm estimation in the streaming model. Why we care: query plans, and optimization in SQL engines. Estimating the L2 norm via Johnson-Lindenstraus. Overview of other norms.
  • the original paper, now with a Goedel award: [Alon, Matias, Szegedy: The Space Complexity of Approximating the Frequency Moments. JCSS 1999, STOC 1996]
  • I believe the latest paper is [Indyk, Woodruff: Optimal Approximations of the Frequency Moments of Data Streams, STOC2005]
  • for a nice description of norm estimation, and a whole lot more about streaming algorithms, see Piotr's class

----- Day 2 -----

Define graph sparsity. Examples; handwave about planar separators and algorithmic applications. [What is a good citation for this topic?]

Expander graphs. Handwave about explicit constructions. 2-minute overview of pseudorandomness. Randomness-efficient amplification via random walks on expanders.
  • An excellent introduction to pseudorandomness is Salil Vadhan's course. If I got you confused, lecture 8 concerns error reduction by expander walks.
Metrics. Embeddability, any metric embeds into L1 with distortion O(lg n).
  • a whole course on metric embeddings, by Michel Goemans.
  • CRC bookchapter by Piotr Indyk and Jiri Matousek.
  • there is a ton of recent work (particularly lower bounds = nonembeddability) that I cannot hope to link to here.
Sparsest-cut problem. Why we care out graph partitioning, and bit about practice. Theory: cut metrics; L1 = cone of cut metrics; use LP to optimize over all metrics and then embed into L1. Handwave about negative-type metrics, SDPs, and O(sqrt(lg n)) approximation.
  • the famous paper that first used embeddings for sparsest cut is [Linial, London, Rabinovich: The Geometry of Graphs and Some of its Algorithmic Applications. Combinatorica 1995, FOCS 1994]
  • the sqrt(lg n) paper is [Arora, Rao, Vazirani: Expander Flows, Geometric Embeddings, and Graph Partitionings. STOC 2004]. See here.
  • a lot of recent (and not so recent) research has been done about this problem. A personal favorite here is [Spielman, Teng: Nearly linear time algorithms for graph partitioning, graph sparsification, and solving linear systems, STOC 2004]

11 comments:

  1. (Warning: selfish plug.)

    If you have database folks, they may enjoy this paper:


    Kamel Aouiche and Daniel Lemire, A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP, DOLAP 2007, 2007.

    http://arxiv.org/abs/cs.DB/0703058

    ReplyDelete
  2. I think there is a better and simpler result than the Speilman-Tang STOC 2004 paper now: check out:

    Local graph partitioning using pagerank vectors,
    FOCS 2006, R. Andersen, F. Chung, K. Lang

    ReplyDelete
  3. Optimization. One nice paper to talk about is the Edmonds paper giving the Blossom algorithm for finding maximum matching. The algorithm might be too hard to explain and you could stick to Hungarian method on bipartite graphs. It is easy to "digress" from here and mention that it is one of the first papers mentioning that there should be a difference between polytime algorithms and exponential time algorithms in any hierarchy. This later formally developed in to P vs NP question.

    You can digress further and talk about approximation algorithms, both from positive and negative sides.

    ReplyDelete
  4. Seems a little esoteric for an *introduction* to TCS. How about more basic things like NP-completeness and the power of randomization? Or more basic algorithms and data structures? Or had the audience already seen stuff like that elsewhere? (I would assume the average mathematician had not.)

    ReplyDelete
  5. Indeed, I was not too clear on the level of the participants. The all know CS kind of stuff, like NP-completeness, shortest paths etc. The goal was to teach them that TCS did something nontrivial in the last 20 years :)

    ReplyDelete
  6. Maybe you could have tried something more colorful like zero knowledge proofs for NP.

    ReplyDelete
  7. Maybe you could have tried something more colorful like zero knowledge proofs for NP.

    To the contrary, this would've been a disaster. You suggest telling CS people something like "we have this really nifty tricks which are really cool if you love theory, but there's not particular way in which they will affect what you do."

    I was looking for stuff with real applications. Everybody was enthusiastic about the relation between SQL enginges and norm estimation. It is a clean theoretical problem which does something for the real world.

    ReplyDelete
  8. Primes in P might have exciting and new.

    ReplyDelete
  9. Hi Mihai,

    Just a note: in the norm estimation stream, the latest paper actually is --

    "Smooth Histograms"
    Vladimir Braverman, Rafi Ostrovsky.
    FOCS 2007

    It has a pretty interesting approach, and also improves the Indyk-Woodruff result.

    PCPs could have been another interesting topic, but selecting what to talk about from that area would have been a challenging task, I reckon.

    Given that I work in Cryptography, I can't stop myself from commening on the ZK stuff: You need to know a bit more about ZK. Recently Avi Wigderson presented a funtastic sereis of 3 lectures at UCLA, one of which included a significant part on ZK. The audience was simply amazed, and I got the impression that they truly enjoyed it (there was so much curiosity -- and there were no nifty tricks to explain!).

    The audience was very much comparable to the one you had -- a lot of UCLA Math department faculty (inclduing Terrence Tao :) ), a good amount of CS Faculty from other areas, and a several students.

    Omkant

    ReplyDelete
  10. Read "nifty" as "nasty" in my last post.

    Omkant

    ReplyDelete
  11. Hi folks,

    Just a clarification, in case someone will still read the last few comments: the (very neat) FOCS'07 "Smooth Histograms" paper addresses the "sliding window" model, where at any time, the output depends only on the last n elements in the stream. In contrast, our STOC'05 paper addresses the model where the output depends on the whole stream, and the stream is allowed to contain deletions. The two models are incomparable in general.

    Cheers,

    ReplyDelete