Recently, I’ve been working on creating Docker containers. Minimizing the image size is a topic of interest. A common pattern seems to be to ADD or COPY some data into the container, use it, and then delete it. This bloats the image. However, there’s a way to solve this!

The Problem

In my case, I needed to install various Debian packages into the image. However, these packages were not acquired using apt, but instead through Docker image context or HTTP download. In other words, like this:

FROM ubuntu:20.04
# From context supplied by host
COPY /foo.deb /
RUN dpkg -i /foo.deb && rm /foo.deb

or:

FROM ubuntu:20.04
ADD https://server.com/foo.deb /
RUN dpkg -i /foo.deb && rm /foo.deb

The COPY or ADD statement creates a layer in the final Docker image. This layer contains the file no matte what; the later RUN statement simply hides it from the final view of the filesystem, but can’t actually remove the data from the earlier layer. This bloats the image.

The Solution

Docker buildkit added a --mount option to the RUN statement. This allows direct access to data of another image in a multi-stage build, and more. This allows us to download or otherwise acquire data in one image, and then access it while building another layer without actually adding it to that target layer’s filesystem. That’s quite a wordy explanation; perhaps an example will help:

FROM ubuntu:20.04 AS downloader
ADD https://server.com/foo.deb /downloads

FROM ubuntu:20.04
RUN --mount=type=bind,from=downloader,source=/downloads,target=/downloads dpkg -i /downloads/*.deb

Here’s an explanation:

This Dockerfile is a multi-stage build. By default, the last image defined in the Dockerfile is all that ends up being tagged. It’s what you ship to your users, or upload to a Docker image repository. The other image(s) are simply utilities used in the construction of the final image.

The first image contains an ADD statement that downloads the desired file. This file is added to a layer in the “downloads” stage, but since this layer isn’t part of the final stage or image, that’s fine; it’s just thrown away later.

The second image mounts part of the filesystem of the first image. This allows it to access those files via the filesystem in a standard way, but doesn’t actually add those files to the image itself. As an analogy, compare this to a system mounting a network filesystem; the files can be accessed on the system, but aren’t actually stored on the system. Thus, we can access the package to run dpkg -i on it, but don’t end storing it anywhere in the final image’s filesystem. The exact parameters are:

  • type=bind: Bind-mount a portion of another stage’s filesystem.
  • from=downloader: The stage to mount from.
  • source=/downloads: The directory in the from stage to mount.
  • target=/downloads: The directory to mount the directory onto when RUNning the command.

Making it work

A couple actions must be taken to make this all work.

First, this relies on buildkit. You will probably need to explicitly enable this by setting environment variable DOCKER_BUILDKIT=1 when running docker build. Apparently docker buildx does this automatically if that command is available in your Docker version.

Second, you must tell Docker that you want to use new syntax. Place the following text at the start of your Dockerfile:

# syntax=docker/dockerfile:1.2

That tells Docker how to parse the Dockerfile. Specifically, it tells Docker to acquire a container named docker/dockerfile with version/tag 1.2 in order to parse the Dockerfile.

Links