In the past 2 years, I have been a member of the Host Scientific Committee (HSC) of the IOI. This is the body that comes up with problems and test data. While it consists primarily of people from the host country (Bulgaria in 2009, Canada in 2010), typically the host will have a call-for-problems and invite the authors of problems they intend to use.
This year, I was elected member of the International Scientific Committee (ISC). This committee works together with the HSC on the scientific aspects, the hope being that a perenial body will maintain similar standards of quality from one year to another. There are 3 elected members in the ISC, each serving 3-year terms (one position is open each year).
I anticipate this will be a lot of fun, and you will probably hear more about the IOI during this time. When a call for problems comes up (will be advertised here), do consider submitting!
I will end with an unusual problem from this IOI:
Consider the largest 50 languages on Wikipedia. We picked 200 random articles in each language, and extracted an excerpt of 100 consecutive characters from each. You will receive these 10000 texts one at a time in random order, and for each you have to guess its language. After each guess, your algorithm learns the correct answer. The score is the percentage of correct guesses.Considering the tiny amount of training data and the real-time nature of guessing, one might not expect too good solutions. However, it turns out that one can get around 90% accuracy with relatively simple ideas.
To discourage students from coding a lot of special rules, a random permutation is applied to the Unicode alphabet, and the language IDs are random values. So, essentially, you start with zero knowledge.
My own approach was to define Score(text, language) as the minimal number of substrings seen previously in this language that compose the text. This can be computed efficiently by maintaining a suffix tree for each language, and using it to answer longest common prefix queries.