Use Numba to work with Apache Arrow in pure Python

03 August 2018

Apache Arrow is an in-memory memory format for columnar data. In more “plain” English, it is a standard on how to store DataFrames/tables in memory, independent of the programming language. One of its most prominent uses is for the @pandas_udf decorator in Apache Spark to move data quickly between Scala and Python/pandas.

AHL Python Hackathon April 2018

19 May 2018

Three weeks ago MAN AHL organised an opensource hackathon at their London office. As part of the Hackathon people should contribute to one of the PyData artifacts they regularly use. To support them in making their first contribution, AHL also coordinated that several core committers of opensource projects were present at the event. I joined in as the representative of the Apache Arrow project.

Play interactively with Apache Arrow C++ in xeus-cling

17 December 2017

Often, we use pyarrow in a Jupyter Notebook during work. With the xeus-cling kernel, we can also use the C++ APIs directly in an interactive fashion in Jupyter.

Akka Streams for extracting Wikipedia Articles

24 February 2016

Use Akka Streams as a new technique to extract specific articles from the Wikipedia xml dump into single files without the need to fit all data into RAM.

Beats Music Support in Tomahawk (and the long journey on how we got there)

18 July 2014

tl;dr: With the latest nightlies (Win, Mac) you can now use your Beats Music Subscription in Tomahawk. To use it just install the Beats Music Resolver. Although Beats has a nice API, supporting it was a though cruise through our underlying multimedia stack.

Page: 1 of 3 Next