Different approaches to building Docker images with Gitlab CI, drawbacks, and solutions.
My most popular blog post so far is Setting up a Docker image builder with Gitlab CI Runner. The focus of that post was how you could set up a relatively secure Docker build environment using Gitlab CI and a host with a Docker engine. It also touches on the subject of how to design your
.gitlab-ci.yml pipeline to build a Docker image. I wrote that piece almost two years ago. Since then I have learned a few neat things about both Docker and Gitlab CI. There has also been some welcome updates in both Gitlab CI and Docker that has improved my docker building pipelines. In this post I will try to show you what my early implementations looked like, which problems they caused, and how later features allowed me to create better implementations. In the end this post should give you an idea about how you could go about setting up Gitlab pipelines to build your own Docker images using Gitlab CI and Docker.
In the early days of Docker and Gitlab CI Runner. Before Docker multistage builds and Gitlab job artifacts building applications and putting them in clean production Docker images was not straight forward. At least not with the
docker executor. My solution involved using the cache feature to transfer build files from a build stage into a containerize stage and referencing them in the
Dockerfile used for the
docker build command. For a Node app written in Typescript using this solution might give you a
.gitlab-ci.yml file which looks like the one below.
stages: build containerize cache: key: $CI_BUILD_REF paths: - dist - server.js - package.json build: stage: build image: node:8 script: - npm install - npm run build containerize: stage: containerize image: docker:17 script: - # A few lines to set DOCKER_HOST and certificates and log in. - docker build -t user/my-typescript-image:latest - docker push user/my-typescript-image:latest
Dockerfile would reference those files:
FROM node:8-alpine WORKDIR /usr/src/app COPY ./package.json /usr/src/app/ RUN npm install --production COPY ./dist /usr/src/app/ COPY ./server.js /usr/src/app/ CMD ["node" "server.js"]
My approach worked relatively well, but has some problems.
- The cached build files is not guaranteed to be there for the
- The build files cache for various projects will build up until the cache is cleared.
- Clearing all the cache will cause other builds using cache for actual caching purposes to slow down.
- Depending on the cache key used, cache collisions could occur thus causing potential race conditions where a containerize job ends up using build files from an unrelated build job.
All in all this was not a very elegant solution, and using cache for build output is not recommended today!
Enter Gitlab Job Artifacts
Gitlab job artifacts allows you to define the artifacts produced by a Gitlab CI job and have those files and folders archived and persisted in Gitlab. The artifacts are made available to subsequent jobs in the pipeline which makes them ideal for pipelines where the output of one job should be used in later stages. Gitlab eventually added the
expire_in option to the artifact feature which allows us to automatically clean up old build artifacts. Armed with this new feature I devised a new
stages: build containerize build: stage: build image: node:8 script: - npm install - npm run build artifacts: paths: - dist - server.js expire_in: 1 hour containerize: stage: containerize image: docker:17 script: - # A few lines to set DOCKER_HOST and certificates and log in. - docker build -t user/my-typescript-image:latest - docker push user/my-typescript-image:latest
Here we use the
artifacts configuration directive in the build job to instruct the CI runner to create an archive of our build files and to persist it in Gitlab for 1 hour. When the runner executes the
containerize job it will download and extract the artifacts archive and make the files available. After 1 hour Gitlab will delete the files. Because job artifacts are stored for their specific pipelines collisions will not occur. As you might have surmised artifacts fix a lot of the issues present with the cache based approach.
Multistage docker builds
The artifact approach above works quite well for many build scenarios. I still use it for some projects, but it does have some issues. Because the runner has to archive and upload the build files to Gitlab at the end of the
build job, and then download and extract the files at the start of the
containerize job, large builds could take some time to run. Because artifacts are unique per pipeline they will not help you with dependencies and build cache. For this you would have to use the Gitlab CI Runner's cache feature.
When Docker 17.05 was released it brought about the Docker multistage build feature, promising a new and simpler way to build clean production images. This feature has simplified and improved how I build Docker images with Gitlab CI as the
build stage and job can be removed in favor of a single
containerize step using a multistage Dockerfile. In essence you would change your old
Dockerfile to use a build and production stage.
FROM node:8 as builder WORKDIR /usr/src/app COPY package.json RUN npm install COPY ./src /usr/src/app/ RUN npm run build FROM node:8-alpine WORKDIR /usr/src/app COPY package.json RUN npm install --production COPY --from=builder /usr/src/app/dist /usr/src/app/ COPY --from=builder /usr/src/app/server.js /usr/src/app/ CMD ["node" "server.js"]
builder stage in the Dockerfile, not to be confused with a Gitlab CI stage, will build the app using your regular
Dockerfile commands. When the
builder stage is complete Docker will have made image layers for the
builder stage. The final stage which produces your actual Docker image will start from the desired base image, here
node:8-alpine. We can then instruct the Dockerfile to copy files from the previous stage instead of our local file system. The result is a clean production image without build dependencies.
You could then change your
.gitlab-ci.yml file to only use one CI stage:
stages: containerize containerize: stage: containerize image: docker:<tag> script: - # A few lines to set DOCKER_HOST and certificates and log in. - docker build -t user/my-typescript-image:latest - docker push user/my-typescript-image:latest
There are some major benefits to this approach. First of all you eliminate the overhead associated with the artifact handling in Gitlab. Second of all the image layers for your
builder stage will be cached by Docker. If you have structured your
builder stage correctly Docker might already have the layers for the correct dependencies and build cache, and Docker will happily start on the
RUN command associated with your build (e.g.
RUN npm run build) thus saving a lot of time. Finally the multistage Dockerfile provides a much more consistent and portable build solution. In the artifact based approach you might end up with one build setup for local use and one for Gitlab CI. A multistage Dockerfile will happily work in both environments without any differences. This makes builds and build errors more reproducible as you can simply run
docker build -t my-container . locally on your dev machine if your build fails in Gitlab.
Two Dockerfiles approach
Readers with lots of Docker experience might have seen a different solution that I so far did not mention. Docker has a copy feature which allows you to copy files out of a container. A solution would thus be to have two
Dockerfiles, one to produce build output and one to run your application. In a single
containerize CI job you would first build a Docker image of your project producing the build output inside the image. Then you would run a container from the image and copy out the necessary files. That container could then be stopped and you would build your production image. This solution was not apparent to me at the time, and would in any case be considered quite complex. The multistage feature works on the same principle, but is far simpler to implement.
We have seen how building Docker images with Gitlab CI has gone from being a somewhat involved process with workarounds involving the Gitlab CI cache to being a first class experience. Using multistage Dockerfiles allows you to run your whole build process with the
docker build command, and reduce the complexity of your Gitlab CI pipelines. This post does not give you all the details, but should point you in the right direction.