-
Use Numba to work with Apache Arrow in pure Python
·Apache Arrow is an in-memory memory format for columnar data. In more “plain” English, it is a standard on how to store DataFrames/tables in memory, independent of the programming language. One of its most prominent uses is for the
@pandas_udf
decorator in Apache Spark to move data quickly between Scala and Python/pandas. -
AHL Python Hackathon April 2018
·Three weeks ago MAN AHL organised an opensource hackathon at their London office. As part of the Hackathon people should contribute to one of the PyData artifacts they regularly use. To support them in making their first contribution, AHL also coordinated that several core committers of opensource projects were present at the event. I joined in as the representative...
-
Play interactively with Apache Arrow C++ in xeus-cling
·Often, we use
pyarrow
in a Jupyter Notebook during work. With thexeus-cling
kernel, we can also use the C++ APIs directly in an interactive fashion in Jupyter. -
Akka Streams for extracting Wikipedia Articles
·Use Akka Streams as a new technique to extract specific articles from the Wikipedia xml dump into single files without the need to fit all data into RAM.
-
Beats Music Support in Tomahawk (and the long journey on how we got there)
·tl;dr: With the latest nightlies (Win, Mac) you can now use your Beats Music Subscription in Tomahawk. To use it just install the Beats Music Resolver. Although Beats has a nice API, supporting it was a though cruise through our underlying multimedia stack.