What Guix could offer in computational medical environments?
Guix as a medical device?
Note: Thanks to Samuel Quentin from Assistance Publique Hôpitaux de Paris (APHP) for the invitation to talk about Guix. The experience was very great! Previously, Sam was working in Saint Louis Research Institute (IRSL) and we experimented together workflows or explanations about Guix. Sam is working on new challenges and asked me to speak about Guix in their internal technical meeting of their new team (SeqOIA / MOABI). Sam, thank you a lot for this opportunity.
As you know, the “personalized”1 medicine is highly quantitative and relies on processing a wide range of data. For instance, one might open the results of the very common blood test. About cancer, it’s even more: large data is intensively processed to obtain more accurate diagnosis and more precise cure. Based on genomics sequencing data, the processing is a multi-step pipeline (workflow) and each step (unit) implies a different software. Therefore, from where I stand, the very first question reads: how do we control all about these software?
Here the slides (PDF) of the talk (French)
In France, we have a dedicated agency (Agence Nationale de Sécurité du Médicament et des produits de santé, ANSM) that provides rules and bylaws about medicine, drug or many components implied in curing a disease. What do they say about software? My translation about Logiciels et applications mobiles en santé:
Some of these software are medical device.
Ok, and what are the constraints on a medical device? My translation about Traçabilité des dispositifs médicaux:
unambiguous identification of the medical device and it must expose,
- reference of the product
- reference of the maker
- serial number
Does it ring a bell? It sounds a similar story as Software Bill of Materials (SBOM) or software supply chain risk management. I will not repeat here,
Guix blog post: Identifying software
Interlude: Debian
Let point out the talk (DebConf 19) by Roche Diagnostics about “Building Medical Devices with Debian”: they « share their experience with migrating a medical device to Linux, debianizing and developing software on Debian. Furthermore […] demonstrate their Debian toolchain and share how to create reproducible and upgradeable Debian environments for development, test and production on multiple architectures. »
Let mention the talk (MiniDebConf 24) about “Reproducible Builds - rebuilding
what is distributed from ftp.debian.org
”. Assuming one needs to verify
the software crun
as it had been built on January, 16th 2024, then it’s
possible to get the build information and rebuild, for instance,
$ wget https://buildinfos.debian.net/ftp-master.debian.org/buildinfo/ 2024/01/16/crun_1.13-1_amd64.buildinfo $ debrebuild crun_1.13-1_amd64.buildinfo
Well, it’s let as an exercise to inspect the file .buildinfo
. And more
homework are available in the BoF (DebConf 24) panel “preserving other
build artifacts”.
Back to the main stage
When I discuss about software identification, most of the time, either people focus on identifying source code, either on preserving binary. It’s the main message here: the both are required! Software is dual, we read source code but we run binary program. Therefore, we must provide:
- a source code identifier, e.g., SoftWare Hash IDentifier (SWHID);
- an identifier capturing the transformation from source code to binary.
Based on the medical device requirements above for traceability, the analogy holds,
- reference of the product \(\Leftrightarrow\) source code identifier;
- reference of the maker \(\Leftrightarrow\) transformation identifier.
And 3. serial number identifying the binary is a redundant identifier deduced from the two others.
In one way or the other, any package manager builds this transformation from source code to binary. The core question is thus: what do we need to keep from the package manager to be able to inspect this transformation? As mentioned just above about Debian, it’s not straightforward. Instead, it’s where Guix shines!
An entry point about Guix seems the article,
Toward practical transparent verifiable and long-term reproducible research using Guix
Nature Scientific Data, vol. 9, num. 597, Oct. 2022.
or the 1h video Tour of Guix (French). Or checkout these presentations:
In a nutshell, when using Guix, the transformation from source code to binary
is captured by the file channels.scm
: this file lists inherent
identifiers. Roughly speaking, these identifiers are Git commit hashes that
exactly pinpoints what to build, how to build it, and recursively for each
dependencies.
In other words, knowing this file channels.scm
, it allows to inspect and
rebuild the exact same software: the exact same source code built using the
exact same recipe and relying on the exact same dependencies. For example,
the command
guix time-machine -C channels.scm -- pack -f docker bwa
will produce the exact same Docker pack containing the Burrow-Wheeler Aligner
(bwa
) for short-read alignment binary program. Now or one year later, the
very same file channels.scm
, and for another example,
guix time-machine -C channels.scm -- build bwa -d
outputs the exact same plumbing recipe for transforming the source code of
bwa
into the binary; recipe including an identification of the source code,
of the dependencies and of the building script.
Not convinced? Checkout When Docker images become fixed-point.
In your workflow, each step is itself one Docker image, no big deal, you might be interested by: Guix: a factory for containers?
Opinionated conclusion
I do not have any special details about the current situation on software medical device in the hospitals. Maybe they are already using a robust system to identify all the software supply chain. If not, I hope the direction is toward a transparent and long term reproducible software supply chain.
Today, it’s technically doable to have full transparency and software identification. And the main roadblocks seem technical habits and political choices; traceable software medical device is not some unreachable horizon but one reachable goal!
However, a similar discussion about the workflows (pipelines) remains, in my humble opinion. Many things are already on the table, e.g., ShareFAIR, but interoperable and shareable protocols (workflows) is still an ongoing open question, to my knowledge.
Well, going further and dreaming, the unreachable horizon would to have a complete health system based only on Free Software and Free Hardware for all the medical devices. The complete health system seen as a Common Good. Arf! Still a lot on the plate…
Footnotes:
Implicitly, in the countries which have the resource and money.