ICFP 2010 short report

Posted on October 4, 2010 with tags . See the previous or next posts.

I’ve recently returned from ICFP 2010 (International conference on functional programming). This was the second time I went there, and like last year, it was a very interesting conference. This year, I’ve even managed to do one keysigning with a fellow Debian Developer (hello Sylvain!)

While there were many talks, I only mention a few key ones, that I have found very cool.

Note: I’m a layman in this field. The text below is my own interpretation of the talks and contains many errors and wrong facts—corrections are welcome!

Workshop on ML

I wasn’t present at this workshop, but other people pointed me to this very cool talk:

Mirage project

The abstract, to be found at http://www.cs.rit.edu/~mtf/ml2010/program.html#Program (search for Mirage), says:

Our framework, dubbed Mirage, extends the Objective Caml language with I/O extensions and a custom runtime to emit binaries that execute as a paravirtual operating system directly under Xen.

The custom run-time is significantly simpler than a general-purpose operating system, with a static single-process 64-bit address space, and “zero-copy” I/O that works directly with the OCaml garbage collector. As a result, Mirage applications exhibit significant performance speedups for I/O and memory handling versus the same code running under Linux/Xen.

It’s extremely interesting, IMHO. You write your code as a more-or-less normal OCaml program, but you compile it with this special tool-chain, and you boot it directly on ton of Xen, as a domU, without the intermediate OS.

The project Git repo is at http://github.com/avsm/mirage, and that links a bit older presentation (http://anil.recoil.org/papers/2010-hotcloud-lamp.pdf) which IMHO is a required read for anyone interested in cloud-related stuff.

ICFP proper

ICFP, the main conference, is as usual, full of math, designed to scare people away. Err, I mean, designed to tell about new developments in this field.

The papers are gathered in this reddit post: http://www.reddit.com/r/haskell/comments/dji4v/papers_from_icfp_2010/.

XenServer

The paper http://anil.recoil.org/papers/2010-icfp-xen.pdf describes the XenServer project, and more specifically the management interface, which is written entirely in OCaml, by a team of ~10 people.

They also have a bit of bin-packing code (http://xenbits.xen.org/xapi/xen-api.hg?file/95b9e4f1b9dd/ocaml/xapi/binpack.ml), but much simpler than the ganeti-htools project on which I’m working (no balancing or similar). On the other hand, it is interesting to see that they committed themselves to write the entire management suite in OCaml, as opposed to our Python plus just a few bits of Haskell. I envy them, very much so ;)

The fun tidbit was how stable they found the OCaml tool-chain. In three years, they had just one bug related to this (the tool-chain), a segfault due to wrong formatting code in the handling path of a seldom-triggered Python exception, which led to a %s to be translated as is to C code, which meant printf got the wrong specifiers. So… in an OCaml project, a serious bug was due to lack of type checking in string formatting in a Python component :)

Overall I have found interesting parallels between their experience and my own with Haskell; from “is this language/tool-chain stable enough for use” to concerns about using FP for a significant project (but of course, orders of magnitude difference between their case and mine).

Super-compilation

There were two papers on this topic; one at ICFP proper and one at the Haskell symposium.

If I understood things right, super-compilation is the opposite of lazy evaluation. But let’s start a bit back.

First, programs written in a functional languages contain many pure function. Pure functions, being side-effect-free, exhibit many interesting properties, and one of them is that they permit decoupling of the computation of a value from the declaration of such computation.

One technique derived from this is lazy computations. Lazy computation means that even though you write y = sin(x) + 1, the actual call to sin and the increment is not actually done until you need the actual value of y (not just for storing it in y, that is).

Super-compilation is the opposite of this. While laziness postpones doing a computation to as late as possible, in the hope it won’t be needed at all, super-compilation runs the actual computation way ahead of time, even before the program starts, that is at compile time. Since programs use input data at runtime, which is not known at compile time, the trick is to pre-compute as much as possible at this stage, given incomplete information; and the papers showed that it’s possible to pre-compute much more than you’d usually believe.

Optimising compilers already do some form of CAF (constant applicative form) optimisation, but the techniques presented in both papers are going beyond simple CAFs. The details are in the papers, and are complicated, but one simple example is converting this:

ones = map inc [1..]

(note that is an infinite list, and you don’t want to fully compute that at any time, neither at compile nor at runtime), into:

ones = [2..]

The details on how this is done depend on the exact super-compilation implementation, and not all implementations can do the same optimisations, but in the presentations at this ICFP the idea was to split things into static/non-static parts, and build on that, or to do partial evaluation, etc.

The result is that super-compilation shows significant increases in speed (in some synthetic benchmarks up to 80%), with a few benchmarks not being sped up at all, but most are. The downside is an explosion in code size, as many more things are pre-computed. Still, as a research topic, this is very interesting.

My own paper

Surprisingly, it seems I survived giving my talk (just an experience report, not actually fancy stuff like the rest). Either the people actually found it interesting (less likely), or they were very polite (more probably). Anyway, the paper is on the papers page (shameless self-promotion) (if you care).

Haskell symposium

This is co-located with ICFP, and is more practical than ICFP (which is, as I said, way too much math). The papers are again available on reddit: http://www.reddit.com/r/haskell/comments/dkzn4/papers_from_the_haskell_symposium_2010/.

Orc: Concurrent Orchestration in Haskell

How to easily write code for synchronising concurrent work? As an example, how complicated is to solve this problem:

[from the Orc literature] we might wish to contact two airlines simultaneously seeking price quotes. If either quote comes back below a threshold price, say $300, then let’s buy a ticket immediately. On the other hand if both quotes exceed the threshold, then buy the cheapest ticket. Additionally, buy a ticket if the other airline does not give a timely quote, or notify the user if neither airline provides a timely quote.

Now, in my experience, this would require manual management and synchronisation of threads, and for example in Python the early abort of querying the price from airline B if airline A returned a good price would be non-trivial; in addition, adding two extra conditions would be again not a simple composition of rules. In Orc however, it takes all of 10 lines of code to do this, safely. I won’t copy the code here, see the paper for details, or the API docs.

And the best part is, this is just a cabal install orc away! I’ll be very interested to see how I can use this in practice, as it is seems indeed a very powerful library.

Nikola: Embedding Compiled GPU Functions in Haskell

Interesting project. Using almost 100% Haskell code, and it marshalls the data to/from the GPU; in the example given, the performance is as good as hand-written CUDA code—of course, this doesn’t happen in all cases. It it’s interesting how they manage to compile the GPU-related code to actual object code at Haskell compile time, which makes it transparent for the user.

Given the prevalence of high-speed GPUs, this could be a nice way to utilise them without learning yet another language (or two, since NVidia and ATI have different stacks).

They’ve also developed some interesting techniques (e.g. manual control of sharing, to prevent inlining) as part of their project.

Scalable Event Handling for GHC

This is a paper co-authored by Johan Tibell of Google Switzerland and Bryan O’Sullivan of Real World Haskell fame.

Very interesting—serving 20K QPS from a Haskell process while having 16K+ idle clients connected:

[…] the performance of the epoll back end does not begin to fall off significantly until we have 50,000 idle connections open.

The talk started with the idea that we’re on the path to serving 10⁷ clients from a single machine, and how to improve Haskell’s I/O manager to achieve (almost) 10⁵ concurrent clients. The slides (which I don’t have a link for) were a bit more cool on numbers than the paper.

Oh, and this will come with GHC 7.0, for free :)

An LLVM Backend For GHC

The eternal fight to get better than the C backend—in other words, to not have to translate Haskell to C… It is interesting to see how many languages target LLVM nowadays, as opposed to plain old C via gcc. A fun trivia item was that, according to the author, there are many features in LLVM that are not actually used by 90% of the languages targeting it (e.g. the garbage collector) (if I heard correctly in the talk).

The results are interesting, although not too much overall speedup is gained; this seems to be due to the hard-to-optimise Cmm intermediate language inside GHC. In any case, a few benchmarks get more than 10% speedup, even in the current form.

This will also come in GHC 7.0, although not enabled by default; probably needs to mature some more…

Conclusion

I’ve enjoyed this conference a lot, and I’ve learned a lot from it. Given that many of the talks are about research topics, it’s harder to apply them in day-to-day work, but I think the pay-offs will be long term rather than immediate.