Recently, I’ve been working on creating Docker containers. Minimizing the image size is a topic of interest. A common pattern seems to be to ADD or COPY some data into the container, use it, and then delete it. This bloats the image. However, there’s a way to solve this!
In my case, I needed to install various Debian packages into the image.
However, these packages were not acquired using
apt, but instead through
Docker image context or HTTP download. In other words, like this:
ADD statement creates a layer in the final Docker image. This
layer contains the file no matte what; the later
RUN statement simply hides
it from the final view of the filesystem, but can’t actually remove the data
from the earlier layer. This bloats the image.
Docker buildkit added a
--mount option to the
RUN statement. This allows
direct access to data of another image in a multi-stage build, and more. This
allows us to download or otherwise acquire data in one image, and then access
it while building another layer without actually adding it to that target
layer’s filesystem. That’s quite a wordy explanation; perhaps an example will
Here’s an explanation:
This Dockerfile is a multi-stage build. By default, the last image defined in the Dockerfile is all that ends up being tagged. It’s what you ship to your users, or upload to a Docker image repository. The other image(s) are simply utilities used in the construction of the final image.
The first image contains an
ADD statement that downloads the desired file.
This file is added to a layer in the “downloads” stage, but since this layer
isn’t part of the final stage or image, that’s fine; it’s just thrown away
The second image mounts part of the filesystem of the first image. This allows
it to access those files via the filesystem in a standard way, but doesn’t
actually add those files to the image itself. As an analogy, compare this to a
system mounting a network filesystem; the files can be accessed on the system,
but aren’t actually stored on the system. Thus, we can access the package to
dpkg -i on it, but don’t end storing it anywhere in the final image’s
filesystem. The exact parameters are:
type=bind: Bind-mount a portion of another stage’s filesystem.
from=downloader: The stage to mount from.
source=/downloads: The directory in the
fromstage to mount.
target=/downloads: The directory to mount the directory onto when
RUNning the command.
Making it work
A couple actions must be taken to make this all work.
First, this relies on buildkit. You will probably need to explicitly enable
this by setting environment variable
DOCKER_BUILDKIT=1 when running
docker buildx does this automatically if that command is
available in your Docker version.
Second, you must tell Docker that you want to use new syntax. Place the following text at the start of your Dockerfile:
That tells Docker how to parse the Dockerfile. Specifically, it tells Docker to
acquire a container named
docker/dockerfile with version/tag 1.2 in order to
parse the Dockerfile.