Uwe’s Blog

My writing about data engineering, opensource development, general programming and thoughts about engineering culture.

  • Building R Arrow on Windows: A tale of two compilers

    Windows support for Apache Arrow is pretty good. There are Python wheels, Python conda packages and a binary build for R on CRAN. One thing that has been missing though for a long time has been a conda package for R Arrow on Windows. Thanks to a lot of experimentation and some important suggestions by Isuru Fernando (Thanks!), we...

  • The one pandas internal I teach all my new colleagues: the BlockManager

    When new members join our team, they usually are already fluent in data analysis with pandas and know their way around the typical quirks. They know that they should use vectorised functions where possible and avoid using apply with a slow Python callable. There are two main reasons, I teach them the BlockManager quite at the beginning....

  • Fast JDBC access in Python using pyarrow.jvm

    While most databases are accessible via ODBC where we have an efficient way via turbodbc to turn results into a pandas.DataFrame, there are nowadays a lot of databases that either only come solely with a JDBC driver or the non-JDBC drivers are not part of free or open-source offering. To access these databases, you can use JayDeBeApi...

  • Taking DuckDB for a spin

    TL;DR: Recently, DuckDB a database that promises to become the SQLite-of-analytics, was released and I took it for an initial test drive. Install it via conda install python-duckdb or pip install duckdb.