TL;DR: Recently, DuckDB a database that promises to become the SQLite-of-analytics, was released and I took it for an initial test drive. Install it via
conda install python-duckdbor
pip install duckdb.
Apache Arrow is provided for Python users through two package managers,
conda. The first mechanism, providing binary, pip-installable Python wheels is currently unmaintained as highlighted on the mailing list. There has been shoutouts for help, e.g. on Twitter that we need new contributors who look after the builds. We sadly cannot point to all...
When working with missing data in
pandas, one often runs into issues as the main way is to convert data into
pandasprovides efficient/native support for boolean columns through the
numpy.dtype('bool'). Sadly, this
True/Falseas possible values and no possibility for storing missing values. Additionally,
The New York City Taxi & Limousine Commission Trip Record Data is a really nice dataset to get started with Data Engineering or teaching it. It has several nice properties that make it quite useful that we will show in this article. We will look at this data using only
pandas, not introducing any other tooling. Many properties...
At the moment in Computer Science, there are two hot topics: AI and Blockchain. Behind these two buzzwords, there are industries striving to build successful products. Currently, I work in the sector often labelled as AI. Usually, it is also described with other terms like Machine Learning or Big Data. In this sector the currently most sought-after job is the...