Wednesday 24 June 2015

Highlights from PyData London 2015

PyData London 2015 took place last weekend, hosted at the shiny Bloomberg offices in London. It was a great experience, full of knowledge sharing and friendly discussions: all the photos taken by Florian Rathgeber are here to testify, and soon videos of the talks should follow. Thanks a lot to the NumFOCUS Foundation, whose motto is "Open code, Better science", for making this event possible.

Among the memorable talks was the one from Juan Luis Cano Rodríguez on Jupyter (see abstract) and the role of IPython notebooks for reproducible science. With all his energy and love of Python, this little guy will fly far. The key idea behind the notebooks is to display and run your code in the browser, embedding the output in the page, be it simple numbers or complex graphics. When the Python (or R/Julia/...) kernel is shut down, the page becomes frozen and ready to be shared. Others can quickly browse it or re-run the code as they wish. Just provide a link to download the input data and you have the perfect mix for reproducible experiments. See here for a live demo in the Nature journal.

PyData London 2015 Conference
Juan starting his talk on Jupyter.

The second talk which just made my day is Romain Guillebert's presentation on Pypy (see abstract), an alternative implementation to CPython, and pymetabiosis, a bridge between CPython and Pypy. He ran a Sobel edge detection coded in pure Python on images stored as python lists: roughly speaking, CPython was running at 2 seconds per frame and Pypy at 25 frame per seconds. To remove any doubt, he did it both with a recorded video and a live feed from his webcam. If all it takes is s/python/pypy/, then I take it: it is much less code adaptation than all the alternatives proposed in the high performance talks (e.g. numpy, numexpr, numba and cython). Guido Van Rossum mentioned it when he came to Imperial: if Python is slow, use Pypy. Romain did a perfect demonstration of it. Of course, Pypy is not ready yet. But shall we switch to a new and faster language where image processing packages might be non-existent or in their infancy, such as Julia, or shall we instead spend our efforts making Python faster? Furthermore, there are companies interested in a faster Python implementation, such as Lyst (where Romain works) who are giving a try at Pypy, or Dropbox (where Guido works), who are working on Pyston, their own implementation of Python. There is the cost of porting existing code, and is it worth losing the simplicity of the Python syntax? Be it Julia, Pypy or others, the global switch will happen when a certain critical mass is reached, both in term of user community and maturity of key packages. It is a good time to choose a side.