Uwe’s Blog

My writing about data engineering, opensource development, general programming and thoughts about engineering culture.

  • Writing a boolean array for pandas that can deal with missing values

    When working with missing data in pandas, one often runs into issues as the main way is to convert data into float columns. pandas provides efficient/native support for boolean columns through the numpy.dtype('bool'). Sadly, this dtype only supports True/False as possible values and no possibility for storing missing values. Additionally, numpy uses a...

  • Data Engineers: The best friends of Data Scientists you forgot to hire.

    At the moment in Computer Science, there are two hot topics: AI and Blockchain. Behind these two buzzwords, there are industries striving to build successful products. Currently, I work in the sector often labelled as AI. Usually, it is also described with other terms like Machine Learning or Big Data. In this sector the currently most sought-after job is the...

  • Data Science I/O - A baseline benchmark for 2019

    Data Science and Machine Learning are tasks that have their own requirements on I/O. As many other tasks, they start out on tabular data in most cases. In contrast to a typical reporting task, they don’t work on aggregates but require the data on the most granular level. Some machine learning algorithms are able to directly work on aggregates but...