Few things are as destructive to team performance as slow feedback loops — they lead to slow product development, slow deployment, and ultimately a poor developer experience for the team. Feedback loops in mature teams are represented by introducing Continuous Integration(CI) into the development life cycle.
One new tool for CI and automating workflow is GitHub Actions from GitHub. And it is the case of using it in Python-based projects that we will talk about in this post.
GitHub Actions are jobs/pipelines that help you automate your development workflows. You can use them to create individual tasks and then combine them into custom workflows, which are then executed - for example - on every push to repository or when release is created.
CI with GitHub Actions
Let's get specific. We have a task: we need to run linter and unit tests for a Python project based on poetry.
Workflow
To use GitHub Actions, we need to create a CI that runs based on the triggers we choose (like push to repository). In the context of GitHub Actions, these are called workflows, which are YAML files that reside in the .github/workflows
directory in our repository:
.github
└── workflows
└── on-push.yml
on-push.yml
will contain a workflow that will be triggered on every push to the repository, let’s look at it:
on:
push:
branches:
- '**'
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
python-version: "3.8"
- name: Install poetry
run: |
python -m pip install --upgrade pip
python -m pip install poetry
- name: Set poetry config
run: python -m poetry config virtualenvs.create false
- name: Install dependencies
run: python -m poetry install
- name: Safety first
run: poetry run safety check
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide, we make it 120 chars
poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=120 --statistics
- name: Test with pytest
run: poetry run pytest tests/
In addition to workflow, there are a several other entities GitHub Actions introduces.
Event
The first thing we see in the file is the event. This describes the GitHub event that will trigger the workflow. There are a number of repository events that can be used to trigger workflows including pushes to a repo, pull requests, releases, and many more.
on:
push:
branches:
- '**'
...
In our example above, it is a push action to any branch.
Job
The jobs
keyword defines a block that lists the jobs for the workflow. Each workflow must have at least one job and each job is identified by a string that we can choose, in our case, it is called “build”:
...
jobs:
build:
runs-on: ubuntu-latest
...
runs-on
attribute lets GitHub know the type of machine we'd like to use to run this job. Essentially, the runs-on attribute describes the operating system for the virtual environment where the job will run.
Steps
One job can consist of multiple steps. Each step has access to the file system in the virtual environment but runs in its own distinct and separate process. Where each step has an action — this is the reusable unit that got into the GitHub Actions name, i.e. this is the minimal part that can be reused in our CI.
Let's move on to the CI steps we have listed:
...
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
python-version: "3.8"
- name: Install poetry
run: |
python -m pip install --upgrade pip
python -m pip install poetry
- name: Set poetry config
run: python -m poetry config virtualenvs.create false
- name: Install dependencies
run: python -m poetry install
- name: Safety first
run: poetry run safety check
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide, we make it 120 chars
poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=120 --statistics
- name: Test with pytest
run: poetry run pytest tests/
We have multiple steps, each step has either uses
or run
attributes.
The main purpose of the uses
attribute is to tell the workflow how to find the action needed by the step. Actions are bundles of code used to perform a specific task or operation they can be located in the same repository, as the workflow in another GitHub, user public repository, or even in a container registry like Docker Hub.
If we aren't using an action for a step, we can use the run attribute. This executes a command or series of commands in a shell in the virtual environment.
Describing our CI steps:
- The first step is to get the current version of the source code by performing a checkout of our repository using the
checkout
action. - The next action says something like: "Please deploy Python to me". And it uses an action called
setup-python
. This action installs Python for us, in our case Python 3.8. - The next action installs Poetry as the package manager used in the project.
- The next action installs all of our dependencies using poetry
Safety first
simply runs some security checks using safety project. Safety checks our installed dependencies for known security vulnerabilities.Test with pytest
runs pytest with a set of tests
GitHub cache
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
GitHub Actions supports file system caching, so we can avoid repetitive step poetry install
(or pip install
/pipenv install
or npm install
, or whatever your package manager is if you're not using Python).
I went one step further and cached the entire Python environment used. The hardest part with this approach is, as stated in the quote above — cache invalidation. We need to make the cached environment to be updated if the project dependencies changed. In poetry-based project this can be done by understanding when poetry.lock
file changes (or requirements.txt
in pip-based project or Pipfile.lock
in pipenv-based project, or whatever requirements files your project uses):
- name: Cache poetry virtualenv
uses: actions/cache@v2
id: cache
with:
path: ${{ env.pythonLocation }}
key: ${{ runner.os }}-python-${{ env.pythonLocation }}-${{ hashFiles('poetry.lock') }}
Cache poetry virtualenv
creates a cache and keys it by the hash of our poetry.lock
file, which means that the cache will be destroyed if our project dependencies change.
The following steps install dependencies (steps Install poetry
and Install dependencies
), but only if no cache hit occurred. This is a huge time-saver if you have a large project.
- name: Install poetry
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m pip install --upgrade pip
python -m pip install poetry
- name: Set poetry config
run: python -m poetry config virtualenvs.create false
- name: Install dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: python -m poetry install
- name: Safety first
run: poetry run safety check
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide, we make it 120 chars
poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=120 --statistics
- name: Test with pytest
run: poetry run pytest tests/
As a result of the first run, we see this:
Note the time it took to complete each step. If you look closely, you will see that installing the dependencies took 29s and installing poetry took 9s. Since the environment will be cached on subsequent runs(assuming dependencies are unchanged), it should take 0s, saving us more than half a minute:
For reliability, you can also invalidate the cache if the actual workflow file changes (on-push.yml
), but that's up to you.
As the test suite grows, we could also increase the number of test runners from 2 to 4, but that may not matter much since the current version of GitHub Actions only has 2 cores per container.