Back from ACM Rep 2024

I’m really happy to attend to the ACM Conference on Reproducibility and Replicability (ACM Rep 2024) in Rennes, France.

I present the collaborative work about Source Code Archiving to the Rescue of Reproducible Deployment, in other words, how Guix and Software Heritage is connected. For more details, please check out the article and the slide. It’s a multi-years effort and many people contributed to make it happen, thank you: Antoine R. Dumont and Antoine Lambert (SWH team members), Antoine Eiche (personal website) and Timothy Sample (personal website) supported by Alfred P. Sloan Foundation; to name some person among the discussions.

The key is content-address! Both Guix and Software Heritage rely on “intrinsic identifier” for identifying source code. Please use¹ inherent identifiers instead of version labels; see SWHID.

The other ingredient is Disarchive. Considering source code managed by Version-Control System (VCS) as Git, Mercurial, Subversion, etc. nothing special is required: “only” a correspondence table between two content-address systems. However, considering compressed tarball, another component –named Disarchive database– is required since SWH archives content and not “metadata” (e.g., compression level, etc.). Hence, Disarchive extracts the metadata from the compressed tarball and stores it, while the content is archived by Software Heritage. Later, Guix queries both Software Heritage and the Disarchive database in order to rebuild the exact same compressed tarball.

To my knowledge, it makes Guix the first free software distribution and tool backed by the stable Software Heritage archive.

Although it remains some challenges to have a strong bullet proof tool, this work paves the way for being able to fully manipulate the whole software stack, both today and tomorrow.

Interested? Join the fun, join Guix for Science!

Spotlight on three contributions

All the talks are worth and most of the time I’m like « Wouah! Compelling! ». Next time, you should join the conference too! 😀 Well, for the future myself, I would like to point these 3 works:

The Impact of Hardware Variability on Applications Packaged with Docker and Guix: a Case Study in Neuroimaging (article)

we study the effect of nine different CPU models using two software packaging systems (Docker and Guix), and we compare the resulting hardware variability to numerical variability measured with random rounding.

Gael Vila, Emmanuel Medernach, Ines Gonzalez Pepe, Axel Bonnet, Yohan Chatelain, Michael Sdika, Tristan Glatard, Sorina Camarasu Pop
Longevity of Artifacts in Leading Parallel and Distributed Systems Conferences: a Review of the State of the Practice in 2023 (article)

By reviewing the methods and tools used to create and share artifacts in a technical, in-depth, and article content-agnostic manner, we found that the state of practice does not address reproducibility in terms of artifact longevity and we expose eight observations that support this finding.

Quentin Guilloteau, Florina Ciorba, Millian Poquet, Dorian Goepp, Olivier Richard
Embracing Deep Variability For Reproducibility and Replicability (article)

we delve into the application of software engineering techniques, specifically variability management, to systematically identify and explicit points of variability that may give rise to reproducibility issues (eg language, libraries, compiler, virtual machine, OS, environment variables, etc).

Mathieu Acher, Benoit Combemale, Georges Aaron Randrianaina, Jean-Marc Jezequel

Footnotes:

Yes, Zooko’s triangle but human-readable appears here less important than secure and decentralized.