-
Let people invite themselves to Google Calendar entries using AppScript
·If you want to organise an event with a group of people within your Google Workspace, you can invite the whole workspace or ask around who wants to attend. It has been the norm at my current workplace to post in Slack and let people react with an emoji if they wish to attend. This was convenient as any attendee...
-
"Killed: 9" – Getting codesigning to work with conda-pack on Apple Silicon (osx-arm64)
·If your error message is simply “Killed: 9”, you know you are in for a long debug session. Sadly, in this case, even a debugger would not give you any further information. Thus, this has been a problem that could only be fixed by having good insight and knowledge of the packaging percularities of the particular platform.
-
Automating miniforge updates using Github Actions
·miniforge
and its variantsminiforge-pypy
andmambaforge-*
are the base installers for usingconda
withconda-forge
as the default source for packages. They will provide you with a basic conda installation to get started. This means that as part of that, the newest installers should also bring the newest... -
The implications of pickling ML models
·When you have trained a machine learning model (pipeline), you will make predictions directly afterwards to assess its quality. When using the model actually for something useful, we also want to make predictions with it at a later point in time. This forces us to store the model to disk and think of a way to serialise it.
-
Deploying conda environments in (Docker) containers - The Cheatsheet!
·Deploying conda environments inside a container looks like a straight-forward
conda install
. But with a bit more love for details, you can optimise the process so that the build is faster and the resulting container much smaller. -
Deploying conda environments in (Docker) containers - how to do it right
·Deploying conda environments inside a container looks like a straight-forward
conda install
. But with a bit more love for details, you can optimise the process so that the build is faster and the resulting container much smaller. -
Apache Arrow on the Apple M1
·In the previous blog post I explained how I got a well-working setup on my M1 MacBook. With that in place, I mostly worked on my main work setup running. But as a core Apache Arrow developer, I was also very eager to spend the extra mile and get Arrow (the C++ and Python part) working on the M1....
-
The first two weeks with the Apple M1
·Apple recently published new computers that contain their new M1 processors. I was quite excited about them because of the promises made by various benchmarks regarding performance and energy consumption but also because it is also a new platform. Most things won’t work there and some assumption on how we work today have to change if you want to use...
-
Fast JDBC access in Python using pyarrow.jvm (2020 edition)
·About a year ago, I have benchmarked access databases through JDBC in Python. Recently, the maintainer of
jpype
gave me a heads-up that they significantly improved performance on their side. While this is actually the library I’m comparing mypyarrow.jvm
-based approach to, I have a high appreciation for any performance tuning that is... -
Calculating levenshtein distances with fletcher
·Levenshtein distance is a typical measure to compare two different strings. It gives you the minimal number of add, remove and replace operations to transition from one string to another.
-
Trimming down pyarrow’s conda footprint (Part 2 of X)
·We have again reduced the footprint of creating a conda environment with
pyarrow
. This time we have done some detective work on the package contents and removed contents fromthrift-cpp
andpyarrow
that are definitely not needed at runtime. -
Removing Python as a dependency of R
·Surprisingly Python was a runtime dependency of R on conda-forge. As R doesn’t need Python to run, this was a bit weird. We got rid of this by splitting up the GLib package.
-
Trimming down pyarrow’s conda footprint (Part 1 of X)
·We have substantially reduced the footprint of creating a conda environment with
pyarrow
. While working on this, we have also substantially reduced the size of a base Python installation from conda-forge. All this was done without disabling any functionality. We reduced the size of a conda environment for pyarrow by nearly 50% and reduced the “pyarrow tax” for... -
Building R Arrow on Windows: A tale of two compilers
·Windows support for Apache Arrow is pretty good. There are Python wheels, Python conda packages and a binary build for R on CRAN. One thing that has been missing though for a long time has been a conda package for R Arrow on Windows. Thanks to a lot of experimentation and some important suggestions by Isuru Fernando (Thanks!), we...
-
The one pandas internal I teach all my new colleagues: the BlockManager
·When new members join our team, they usually are already fluent in data analysis with
pandas
and know their way around the typical quirks. They know that they should use vectorised functions where possible and avoid usingapply
with a slow Python callable. There are two main reasons, I teach them theBlockManager
quite...