Recently, I’ve been working on creating Docker containers. Minimizing the image size is a topic of interest. A common pattern seems to be to ADD or COPY some data into the container, use it, and then delete it. This bloats the image. However, there’s a way to solve this!
The Problem
In my case, I needed to install various Debian packages into the image.
However, these packages were not acquired using apt
, but instead through
Docker image context or HTTP download. In other words, like this:
or:
The COPY
or ADD
statement creates a layer in the final Docker image. This
layer contains the file no matte what; the later RUN
statement simply hides
it from the final view of the filesystem, but can’t actually remove the data
from the earlier layer. This bloats the image.
The Solution
Docker buildkit added a --mount
option to the RUN
statement. This allows
direct access to data of another image in a multi-stage build, and more. This
allows us to download or otherwise acquire data in one image, and then access
it while building another layer without actually adding it to that target
layer’s filesystem. That’s quite a wordy explanation; perhaps an example will
help:
Here’s an explanation:
This Dockerfile is a multi-stage build. By default, the last image defined in the Dockerfile is all that ends up being tagged. It’s what you ship to your users, or upload to a Docker image repository. The other image(s) are simply utilities used in the construction of the final image.
The first image contains an ADD
statement that downloads the desired file.
This file is added to a layer in the “downloads” stage, but since this layer
isn’t part of the final stage or image, that’s fine; it’s just thrown away
later.
The second image mounts part of the filesystem of the first image. This allows
it to access those files via the filesystem in a standard way, but doesn’t
actually add those files to the image itself. As an analogy, compare this to a
system mounting a network filesystem; the files can be accessed on the system,
but aren’t actually stored on the system. Thus, we can access the package to
run dpkg -i
on it, but don’t end storing it anywhere in the final image’s
filesystem. The exact parameters are:
type=bind
: Bind-mount a portion of another stage’s filesystem.from=downloader
: The stage to mount from.source=/downloads
: The directory in thefrom
stage to mount.target=/downloads
: The directory to mount the directory onto whenRUN
ning the command.
Making it work
A couple actions must be taken to make this all work.
First, this relies on buildkit. You will probably need to explicitly enable
this by setting environment variable DOCKER_BUILDKIT=1
when running docker
build
. Apparently docker buildx
does this automatically if that command is
available in your Docker version.
Second, you must tell Docker that you want to use new syntax. Place the following text at the start of your Dockerfile:
That tells Docker how to parse the Dockerfile. Specifically, it tells Docker to
acquire a container named docker/dockerfile
with version/tag 1.2 in order to
parse the Dockerfile.