Reproduce Docker images produced by Guix
shell the smoothie
Docker images are smoothie, right? They lack transparency and it is hard nor impossible to know what is strawberry or whale oil, right? Although containers are efficient way to ship things, the core question is how these things are produced.
The aim of this post is to demonstrate that the issue is not Docker images by themselves, instead the concrete question when speaking about reproducibility, is: from where the binaries come and using which tool for supplying?
This scenario had been initially written as comment when reviewing patch#45919.
Alice generates
Alice is working on a standard scientific stack using Python. Therefore,
she stores along her project the files manifest.scm
containing the package
set and channels.scm
containing the state of Guix (other said the
version). Owning these two files allows to replay using guix time-machine
the exact same computational environment.
Concretely, manifest.scm
reads,
(specifications->manifest (list "python" "python-numpy"))
and guix describe -f channels returns,
(list (channel (name 'guix) (url "https://git.savannah.gnu.org/git/guix.git") (commit "fb32a38db1d3a6d9bc970e14df5be95e59a8ab02") (introduction (make-channel-introduction "9edb3f66fd807b096b48283debdcddccfea34bad" (openpgp-fingerprint "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))))
So far, so good. Because Alice needs to run this stack on some infrastructure not running Guix but instead running Docker, she just pack her scientific stack with something along this line,
$ guix pack -f docker --save-provenance -m manifest.scm
The next step might depend. One solution is to locally load the generated tarball using Docker tools, something along this line,
$ docker load < /gnu/store/6rga6pz60di21mn37y5v3lvrwxfvzcz9-python-python-numpy-docker-pack.tar.gz Loaded image: python-python-numpy:latest $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE python-python-numpy latest ea2d5e62b2d2 51 years ago 431MB
then docker push
to a convenient registry. The second solution is to
transfer the previous tarball as any other data to the other infrastructure
and run overthere the previous Docker commands.
For the sake on the demonstration, on the other machine, it just works:
$ docker run -ti python-python-numpy:latest python3 Python 3.8.2 (default, Jan 1 1970, 00:00:01) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np import numpy as np >>> A = np.array([[1,0,1],[0,1,0],[0,0,1]]) A = np.array([[1,0,1],[0,1,0],[0,0,1]]) >>> _, s, _ = np.linalg.svd(A); s; abs(s[0] - 1./s[2]) _, s, _ = np.linalg.svd(A); s; abs(s[0] - 1./s[2]) array([1.61803399, 1. , 0.61803399]) 0.0 >>> quit()
Neat!
On a side note, the Docker image is directly produced by Guix. Other said,
Guix manages everything, from the binary packages and all the requirements
to the Docker image itself – no Dockerfile
involved. In other words, this
Docker image is just a container format among many others, for instance
guix pack -f squashfs --save-provenance -m manifest.scm
will generate a
Singularity image (other container format) with the exact same binaries
inside.
Bob redo later and elsewhere
Bob works with the Alice’s Docker image. He needs to run this exact same versions on another infrastructure using plain relocatable tarballs, for example. Or he needs to scrutinize how all the binaries in this stack are produced, because maybe he found a bug and want to know if all the results obtained with this Docker image are correct or not, or maybe he wants to study a specific aspect to better understand a specific result. Well, Bob is doing Science and thus Bob needs transparency.
The files manifest.scm
and channels.scm
sadly disappeared long time ago.
Probably at the end the Alice’s postdoc. If the Docker image had been
produced with Dockerfile
, then game over!
Hopefully, Bob remembers this Docker image had been produced with Guix
(pack --save-provenace
). Let get the recipe of the smoothie.
Here the tricks! First, let start the container which eases the export to plain tarball. Second, let extract the embedded Guix profile.
$ docker run -d python-python-numpy:latest python3 e1775ff836915dc55195eafd1710eec07106bd1677bde153e5842a0ded43395d $ docker export -o /tmp/re-pack.tar $(docker ps -a --format "{{.ID}}"| head -n1) $ tar -xf /tmp/re-pack.tar $(tar -tf /tmp/re-pack.tar | grep 'profile/manifest') $ tree gnu gnu └── store └── ia1sxr3qf3w9dj7y48rwvwyx289vpfgi-profile └── manifest 2 directories, 1 file
Wow! Is it really a regular profile? Yes, it is!
$ guix package -p gnu/store/ia1sxr3qf3w9dj7y48rwvwyx289vpfgi-profile --export-channels ;; This channel file can be passed to 'guix pull -C' or to ;; 'guix time-machine -C' to obtain the Guix revision that was ;; used to populate this profile. (list (channel (name 'guix) (url "https://git.savannah.gnu.org/git/guix.git") (commit "fb32a38db1d3a6d9bc970e14df5be95e59a8ab02") (introduction (make-channel-introduction "9edb3f66fd807b096b48283debdcddccfea34bad" (openpgp-fingerprint "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))) ) $ guix package -p gnu/store/ia1sxr3qf3w9dj7y48rwvwyx289vpfgi-profile --export-manifest ;; This "manifest" file can be passed to 'guix package -m' to reproduce ;; the content of your profile. This is "symbolic": it only specifies ;; package names. To reproduce the exact same profile, you also need to ;; capture the channels being used, as returned by "guix describe". ;; See the "Replicating Guix" section in the manual. (specifications->manifest (list "python" "python-numpy"))
Awesome, isn’t it? These two last outputs are equivalent to the Alice’s
manifest.scm
and channels.scm
ones. Other said, let run whenever and
wherever1 this,
guix time-machine -C new-channels.scm \
-- pack -f docker --save-provenance -m new-manifest.scm
should produce the exact same docker-pack.tar
as previously. If not,
raise your hand and open a bug.
Join the fun, join GNU Guix!
Footnotes:
wherever, Guix installed at least. :-)