Automating miniforge updates using Github Actions

miniforge and its variants miniforge-pypy and mambaforge-* are the base installers for using conda with conda-forge as the default source for packages. They will provide you with a basic conda installation to get started. This means that as part of that, the newest installers should also bring the newest conda and mamba versions with them.

In addition to the miniforge installers, conda-forge also provides Docker images with miniforge installed inside them. The docker images are managed through the miniforge-images repository. The images are based upon a small Ubuntu 20.04 base container and have the respectively named miniforge version installed in /opt/conda. They are provided on both DockerHub and quay.io:

Up until recently the version updates were all done manually when one of the maintainers noticed that they were out of sync. On the side of the container images, this led to the extreme situation that only one update had been done. While these updates are simple, they have not been getting that much attention as there was neither any critical bug fixed nor features people were eagerly looking forward. The changes were mostly small bit-by-bit improvements.

As the updates are simple to do in these repositories though, it was simple on the other side to automatise them with Github Actions.

Updating mamba in Miniforge

As the first action, I have built an automated Github Actions workflow to update the mamba version in the miniforge repository. In this action, we execute a small Python script that asks the Anaconda API for the latest mamba version and if found updates the Miniforge3/construct.yaml file. If changed, it opens a new PR on the repository.

Getting the most recent version from anaconda.org can be done using the following Python snippet. We fetch therefore all the available versions from the Anaconda API and use setuptools’s packaging.version to parse and sort the version numbers.

import requests
from packaging import version

def get_most_recent_version(name):
    request = requests.get(
        "https://api.anaconda.org/package/conda-forge/" + name
    )
    request.raise_for_status()

    pkg = max(
        request.json()["files"], key=lambda x: version.parse(x["version"])
    )
    return pkg["version"]

After we have received the latest version, we load and rewrite the Miniforge3/construct.yaml file with the latest version. We do this without checking what the current value is as the following step in the workflow can check for us whether there was a change. In the case of a change, the peter-evans/create-pull-request creates a new pull request (or updates the existing one if one is open).

Note that we have pinned the actions here with an exact commit id. This is an additional security measure to protect us from the compromise of this action. We have checked that the current version doesn’t do anything we would be afraid of. If we would have only pinned this to a tag, an attacker could get access to a Github token with repository write access and modify the contents/releases of the miniforge repository.

- name: Create Pull Request
  id: cpr
  # This is the v3 tag but for security purposes we pin to the exact commit.
  uses: peter-evans/create-pull-request@052fc72b4198ba9fbc81b818c6e1859f747d49a8
    with:
      commit-message: "Update mamba version"
      title: "Update mamba version"
      body: |
        This PR was created by the autoupdate action as it detected that
        the mamba version has changed and thus should be updated
        in the configuration.
      branch: autoupdate-action
      delete-branch: true

Normally, when a Github action opens a pull request, the CI checks are not automatically run. A user has to manually close/reopen the pull requests for the actions to start. This a measure to prevent actions from re-triggering themselves in a cyclic chain. You can though override this behaviour by using an (SSH) deploy key to make the commit for the PR. To integrate that in the workflow, you need to create such a deploy key, add it as a deploy key and a secret to the repository and then use it to check out the code:

- uses: actions/checkout@v2
  with:
    ssh-key: $

With this the action is complete and we set it up to run every 6 hours:

on:
 schedule:
   - cron: "0 */6 * * *"

Open an issue on new conda releases

The second workflow I have implemented was for raising an issue when a new conda version was released. We did opt here to only raise an issue as the conda version is pulled from the git tag. Thus for a new conda version, we don’t need to change any code but simply push a new tag to the repository.

For this workflow, I decided to explore using JavaScript with the actions/github-script@v3. In contrast to Python, this is not my primary language of choice but Github Actions’ language of choice and thus the startup time is near to non-existent.

With the actions/github-script@v3 we already get an initialised Github client as github variable that we can use to query the Github API. We use this to get the latest release of the miniforge repository. Sadly, there is only an API endpoint for releases, not for tags. Thus we might get issues if miniforge is only tagged but not released.

With the following snippet, we can then extract the version of the latest release. We also have extracted the latest release of conda using the Anaconda API.

github.repos.getLatestRelease({
  owner: context.repo.owner,
  repo: context.repo.repo,
}).then((release) => {
  const current_version = release['data']['tag_name'].split("-")[0]
  
});

In the case that the conda version is higher than the encoded version in the miniforge release, we are looking whether there is already an open issue asking for a new release. If there is none, we are using the github client to open a new one.

github.issues.listForRepo({
  owner: context.repo.owner,
  repo: context.repo.repo,
  state: "open",
  labels: "[bot] conda release"
}).then((issues) => {
  if (issues.data.length === 0) {
    github.issues.create({
      owner: context.repo.owner,
      repo: context.repo.repo,
      title: "New conda release: please tag a miniforge release",
      body: "A new conda release was found, please tag a new miniforge release with `" + conda_version + "-0`",
      labels: ["[bot] conda release"]
    });
  }
});

Using these building blocks, we already have a fully working workflow that opens an issue asking for a new tag of miniforge on a conda release.

Updating miniforge-images on miniforge release

With miniforge now being fully automated, we can shift our focus to miniforge-images, the repository for which I initially started this effort. The final workflow to make automatic pull requests on miniforge releases is a combination of the ideas of both workflows from above.

We can use the github releases API again to retrieve the latest tag of miniforge:

github.repos.getLatestRelease({
  owner: 'conda-forge',
  repo: 'miniforge',
}).then((release) => {
  const miniforge_version = release['data']['tag_name'];

We then revert to using sed to search and replace the version number in the files that contain it:

exec("sed -i -e 's/MINIFORGE_VERSION: \"[0-9.\\-]*\"/MINIFORGE_VERSION: \"" + miniforge_version + "\"/' azure-pipelines.yml", (error, stdout, stderr) => { 

exec("sed -i -e 's/MINIFORGE_VERSION=[0-9.\\-]*/MINIFORGE_VERSION=" + miniforge_version + "/' ubuntu/Dockerfile", (error, stdout, stderr) => {

Once the files have been modified, we use the peter-evans/create-pull-request action again to create a new pull request in the case that there is a difference to master.

The workflow was added in miniforge-images#5 and was fixed a little bit in miniforge-images#8 because I accidentally hard-coded the miniforge version in the initial one.

Tricks for developing these workflows

While developing these workflows, I learnt some tricks on how to effectively develop these kinds of automation workflows.

The most important part of developing these kinds of workflows is to do this on a separate development repository. As the final workflow will only be triggered from the main / master branch, you will push quite a number of commits (or amend-and-force-pushes) to that branch. This is something that the other collaborators on the project won’t appreciate on the main repository.

Additionally, you could develop the workflow with the final cron trigger with just a shorter scheduling interval but that will still lead to quite long feedback cycles. The issue here is cron triggers are not immediately activated after the push but may be delayed by some unknown amount of time.

Finally, if you are automatically making pull requests, Github already by default disables automatic workflow runs on the created pull request. This prohibits that you build infinite CI loops. Once you have ensured that your workflow doesn’t trigger that you can switch to using deploy keys for making the pull requests. Up until working on this, I expected that deploy keys would be read-only but they actually can be marked as write keys, too.o

Title picture: Photo by C M on Unsplash