Crafting The Perfect Pipeline In GitLab

March 20, 2019

When using a traditional single-server continuous integration (CI), fast, incremental builds are simple. Each time the CI builds your pipeline, it’s able to use the same workspace, preserving the state from the previous build. But what if you are using Kubernetes runners to execute your pipelines? These can be spun up or down on demand, and each pipeline execution is not guaranteed to be executed on the same runner, same machine, or even in the same country as the previous execution.

In this case, the most basic configuration that just clones and compiles the code will always have to do a full rebuild. It could take a while to compile the code for larger projects and you would lose the quick feedback loop, but could you use the features of GitLab CI for this instead? Let’s explore.

Docker

The first step to tackle is the build dependencies (compilers, third-party libraries and other tools). You don’t want to have to install the dependencies every time you build the project, and since we cannot depend on them being installed globally on any runner, the most natural solution is to use a container. Since containers may be used for deployment, the simplest approach is to have a single Docker container that installs the compile-time and run-time dependencies, as well as copies, builds and executes the code. The downside is that any single change to the code will cause the entire copy layer to be executed again, resulting in a full rebuild.

Crafting the Perfect Pipeline in GitLab

A better solution is to use two separate containers. A builder container that includes the compile-time dependencies, and a runner container that only includes the run-time dependencies.

Docker & GitLab CI

So, how do we achieve this within GitLab CI? It is generally a good idea to split the build into several stages. This will provide better feedback regarding which stages have failed, and it allows for better load-balancing of stages between runners when multiple pipelines are running simultaneously. Stages should be defined for building the builder container, the project itself, and the runner container.

GitLab includes its own Docker registry that can be used for storing images between stages, or you could use an external Docker registry if preferred. Either way you must log in to the preferred registry, as well as set up stages and some variables that are used later to tag the images.

Defining the stage that will build the builder container comes next. Since this could be executed anywhere, you cannot rely on the cache of previous builds. Instead, you must explicitly pull down any previous image and use Docker’s cache-from option to instruct it to use this image for any cache checks. You tag the image both with a tag specific to this pipeline that will be used for caching in the next execution of this pipeline, as well as with a tag specific to this hash of this commit that will be used in subsequent stages of this pipeline. If you use only the first of these, multiple pipelines running at the same time could interfere with each other.

GitLab CI Build Cache

The next step is to use the builder container to achieve incremental builds of the project itself, which requires caching of the build output between executions. The default caching will not actually work with Kubernetes runners due to their distributed nature, so you must first configure GitLab to use a central cache, such as S3.

Caching the build-output alone is also not sufficient to achieve incremental builds. Each time the stage is run, the repository is cloned, which resets all the file-modification times to the current time (this causes everything to be rebuilt anyway). For more information on how I have found a way to achieve an incremental build, please visit my blog post on our Horizons research website.

Artefacts

Next, you need to transfer the build artefacts to the next stage, so they can be copied into the runner container. I recommend using the install feature of your build system to copy only the necessary binaries into an install folder.

You then configure your build stage to treat these files as artefacts. When you create the container-builder step, add the builder step as a dependency, which will cause it to automatically pull across any related artefacts.

The building of the container itself is similar to the building of the builder container, using a Dockerfile that copies in the binaries from the install directory.

Running Tests & Multi-Project Pipelines

To run a suite of tests as part of the CI pipeline, containers for these can be built similarly to the runner container except with the test binary as the entry-point.

For larger projects you may wish to split the CI build into multiple pipelines. Since our previous steps have each pushed their containers to a central Docker registry, the downstream pipelines can pull down and launch the containers that they require. If you want to be able to trigger these downstream pipelines automatically, you’ll need GitLab Premium for its "Multi-Project Pipelines" feature.

More information on some things to keep in mind when running test containers and multi-project pipelines can be found in my blog post on our Horizons research website.

Conclusion

By combining GitLab CI and Kubernetes runners with Docker and the techniques described above, you can achieve the scalability of Kubernetes while maintaining some of the speed of incremental builds. There will inevitably still be some slow-down, as the pushing/pulling of containers takes some additional time, but build times (especially on larger projects) can be improved enough to retain the fast feedback loop-- one of the benefits of using continuous integration.

Visit our Horizons research portal for more information and follow Thales on Twitter, LinkedIn and Facebook.

Thales Blog

Crafting The Perfect Pipeline In GitLab

March 20, 2019

Docker

Docker & GitLab CI

GitLab CI Build Cache

Artefacts

Running Tests & Multi-Project Pipelines

Conclusion

Related Articles

Thales AI Cybersecurity: Using AI, Protecting AI, Protecting Against AI

Thales 2025 Cloud Security Study: Despite Investments, Challenges Increase

The Hong Kong Stablecoins Bill: Securing Trust in a Regulated Digital Future

DSPM Is Only as Strong as Your Data Protection Strategy

Unstructured Data Management: Closing the Gap Between Risk and Response

Conquering complexity and risk with data security posture insights