Why GuixHPC matters

ground the road taken by Guix from a scientific practitioner

The recent post about PyTorch opens various discussions; for a better. I feel misunderstandings are floating around, so here my point of view.

When presenting Guix to scientific folk, we often get back: «nice but let return to address real problems». Why are Guixers investing so much effort when others do not understand why they are doing so?

Behind the terms Science, Technology, Engineering or Security, we all have an (implicit) idea about the concept but because it is not possible to clearly define them, then we do not speak about the exact same thing.

For instance, most of us agree to say that cooking is not Science. Despite the fact cooking is well documented (recipes book), reproducible (cake has always the same taste) and bring knowledge (cook this and that and the result is yellow foam). One might object that cooking is based on chemistry rules. Yes, as fluid mechanic is based on molecular dynamics rules (based on quantum mechanics rules). Well, it is hard to define what Science is and so if we try to rule a definition then either cooking is part of Science, either cooking is not but another field (considered as Science such that fluid mechanic) is not part too.

There is many many books trying to define Science. It is a really, really hard problem; named Demarcation problem.

Not being able to explicitly define often leads to dogma, i.e., philosophical tenet, i.e., philosophical principle on which a belief is based.

Instead of arguing these philosophical grounds and their implications, we bikeshed. It leads to debates as “pragmatism” vs “idealism” or “Worse is Better” vs “The Right Thing” or “reasonable” vs “fundamentalist” or etc.

The real question we should ask is: From where does the scientific replication crisis come? Why do we have to say “Reproducible Science”1 when science always contains reproducibility in its definition? Is not the activity of redoing independently the guardian for an universal value?

From my understanding, a part of this reproducibility crisis comes from on one hand a complexity increase of all the chain to study an object and (in the same time) on other the hand a transparency decrease of this very same chain.

That’s said, we accept somehow a variability in the result; this variability comes from either the object itself under study, either the tools (instrument or software) involved to make the study (observing, measuring, analysing, etc.), either both.

To be able to fully understand the study of an object, one has to fully control this variability. Do not take me wrong, it is hard nor impossible to fully control what happens inside a Petri dish, for sure. The key is to be able to introspect the tool itself. This introspection means transparency, be able to audit and challenge this very same tool.

It is not black and white, without a doubt, and most of the time it is grey. The reproducibility crisis is exactly because it is grey! More black boxes in the chain, more grey the picture becomes, higher probability to produce an unverifiable result. Bang! Reproducibility crisis.

Here my aim when working with the GuixHPC project: make computations white boxes.

It is easier to deem as grey boxes all what happen in computers from laptop to cluster. Considering computers as any other instruments – microscope, DNA sequencers, telescope, chronometers, etc. – is what main of us do most of the time. Do I believe computer should be as white box as possible?

Yes, I do.

I do believe that producing Science makes vitally important the full control of the variability introduced by all the chain. Computers (and computations) are tools among many others but no exception. They must be challenged and audited to verify all the arguments leading to scientific knowledge. Full transparency is thus a corollary of scientific production.

If I would have skill about optics, I would invest energy to produce transparent and auditable microscope. But I have skill about programming, so I help to build a transparent and auditable stack of software.

I do believe that producing knowledge requires transparency and the strong ability to audit all the chain outputting this very same knowledge. It is a philosophical position on what the Science is.

The project GuixHPC is an attempt to concretely implement such position for computing, in my humble opinion.



Random query using the term: A manifesto for reproducible science.

© 2014-2024 Simon Tournier <simon (at) tournier.info >

(last update: 2024-06-05 Wed 09:47)