Home
Tags Projects About License

Making CI workflow faster with Github Actions

Making CI workflow faster with Github Actions

Few things are as destructive to team performance as slow feedback loops — they lead to slow product development, slow deployment and ultimately a poor developer experience for the team. Feedback loops in mature teams are represented by introducing Continuous Integration(CI) into the development life cycle.

One new tool for CI and automating workflow is GitHub Actions from GitHub. And it is the case of using it in Python based projects that we will talk about in this post.

GitHub Actions are jobs/pipelines that help you automate your development workflows. You can use them to create individual tasks and then combine them into custom workflows, which are then executed - for example - on every push to repository or when release is created.

CI with GitHub Actions

Let's get specific. We have a task: we need to run linter and unit tests for a Python project based on poetry.

Workflow

To use GitHub Actions, we need to create a CI that runs based on the triggers we choose (like push to repository). In the context of GitHub Actions, these are called workflows, which are YAML files that reside in the .github/workflows directory in our repository:

.github
└── workflows
    └── on-push.yml

on-push.yml will contain a workflow which will be triggered on every push to the repository, let’s look at it:

on:
  push:
    branches:
      - '**'

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2
      - name: Set up Python 3.8
        uses: actions/setup-python@v2
        with:
          python-version: "3.8"
      - name: Install poetry
        run: |
          python -m pip install --upgrade pip
          python -m pip install poetry
      - name: Set poetry config
        run: python -m poetry config virtualenvs.create false
      - name: Install dependencies
        run: python -m poetry install
      - name: Safety first
        run: poetry run safety check
      - name: Lint with flake8
        run: |
          # stop the build if there are Python syntax errors or undefined names
          poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
          # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide, we make it 120 chars
          poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=120 --statistics
      - name: Test with pytest
        run: poetry run pytest tests/

In addition to workflow, there are a several other entities GitHub Actions introduces.

Event

The first thing we see in the file is the event. This describes the GitHub event that will trigger the workflow. There are a number of repository events that can be used to trigger workflows including pushes to a repo, pull requests, releases, and many more. 

on:
  push:
    branches:
      - '**'
...

In our example above, it is a push action to any branch.

Job

The jobs keyword defines a block that lists the jobs for the workflow. Each workflow must have at least one job and each job is identified by a string that we can choose, in our case it called “build”:

...
jobs:
  build:
    runs-on: ubuntu-latest
...

runs-on attribute lets GitHub know the type of machine we'd like to use to run this job. Essentially, the runs-on attribute describes the operating system for the virtual environment where the job will run. 

Steps

One job can consist of multiple steps. Each step has access to the file system in the virtual environment but runs in its own distinct and separate process. Where each step has an action — this is the reusable unit that got into the GitHub Actions name, i.e. this is the minimal part that can be reused in our CI.

Let's move on to the CI steps we have listed:

...
steps:
  - uses: actions/checkout@v2
  - name: Set up Python 3.8
    uses: actions/setup-python@v2
    with:
      python-version: "3.8"
  - name: Install poetry
    run: |
      python -m pip install --upgrade pip
      python -m pip install poetry
  - name: Set poetry config
    run: python -m poetry config virtualenvs.create false
  - name: Install dependencies
    run: python -m poetry install
  - name: Safety first
    run: poetry run safety check
  - name: Lint with flake8
    run: |
      # stop the build if there are Python syntax errors or undefined names
      poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
      # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide, we make it 120 chars
      poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=120 --statistics
  - name: Test with pytest
    run: poetry run pytest tests/

We have multiple steps, each step has either uses or run attributes.

The main purpose of the uses attribute is to tell the workflow how to find the action needed by the step. Actions are bundles of code used to perform a specific task or operation they can be located in the same repository, as the workflow in another GitHub, users public repository or even in a container registry like Docker Hub. 

If we aren't using an action for a step, we can use the run attribute. This executes a command or series of commands in a shell on the virtual environment. 

Describing our CI steps:

  1. The first step is to get the current version of the source code by performing a checkout of our repository using the checkout action.
  2. The next action says something like: "Please deploy Python to me". And it uses an action called setup-python. This action installs Python for us, in our case Python 3.8.
  3. The next action installs Poetry as the package manager used in the project.
  4. The next action installs all of our dependencies using poetry
  5. Safety first simply runs some security checks using safety project. Safety checks our installed dependencies for known security vulnerabilities.
  6. Test with pytest runs pytest with a set of tests

GitHub cache

There are only two hard things in Computer Science: cache invalidation and naming things.
Phil Karlton

GitHub Actions supports file system caching, so we can avoid repetitive step poetry install (or pip install/pipenv install or npm install, or whatever your package manager is if you're not using Python).

I went one step further and cached the entire Python environment used. The hardest part with this approach is, as stated in the quote above — cache invalidation. We need to make the cached environment to be updated if the project dependencies changed. In poetry-based project this can be done by understanding when poetry.lock file changes (or requirements.txt in pip-based project or Pipfile.lock in pipenv-based project, or whatever requirements files your project uses):

- name: Cache poetry virtualenv
  uses: actions/cache@v2
  id: cache
  with:
    path: ${{ env.pythonLocation }}
    key: ${{ runner.os }}-python-${{ env.pythonLocation }}-${{ hashFiles('poetry.lock') }}

Cache poetry virtualenv creates a cache and keys it by the hash of our poetry.lock file, which means that the cache will be destroyed if our project dependencies changes.

The following steps install dependencies (steps Install poetry and Install dependencies), but only if no cache hit occurred. This is a huge time-saver if you have a large project.

- name: Install poetry
  if: steps.cache.outputs.cache-hit != 'true'
  run: |
    python -m pip install --upgrade pip
    python -m pip install poetry
- name: Set poetry config
  run: python -m poetry config virtualenvs.create false
- name: Install dependencies
  if: steps.cache.outputs.cache-hit != 'true'
  run: python -m poetry install
- name: Safety first
  run: poetry run safety check
- name: Lint with flake8
  run: |
    # stop the build if there are Python syntax errors or undefined names
    poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
    # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide, we make it 120 chars
    poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=120 --statistics
- name: Test with pytest
  run: poetry run pytest tests/

As a result of the first run, we see this:

Github Actions

Note the time it took to complete each step. If you look closely, you will see that installing the dependencies took 29s and installing poetry took 9s. Since the environment will be cached on subsequent runs(assuming dependencies are unchanged), it should take 0s, saving us more than half a minute:

Github Actions

For reliability, you can also invalidate the cache if the actual workflow file changes (on-push.yml), but that's up to you.

As the test suite grows, we could also increase the number of test runners from 2 to 4, but that may not matter much since the current version of GitHub Actions only has 2 cores per container.

Materials



Buy me a coffee

More? Well, there you go:

Pip constraints files

Delete sensitive data from git

Continuous Integration & Delivery main ideas