Lightning FOSDEM talk: Reproducible Worfklow?
Guixifying workflow management system: past, present, maybe future?
PDF slides and video (start: ~2min; end: ~6min)
The lightning talk is around 3 minutes – as quick as the light! Not even started bang it’s already done. To be honest, I was very skeptical about my ability to deliver my message under this strong time-constraint; especially when I feel below the average as speaker and when my English flow is chaotic, to put it mildly.
When preparing the slides, my mindset was: but why? Why do I have say yes for a lightning talk? The morning, I was very stressed – as often, I suffered from stage fright or I had butterflies in my stomach – well, more than usual. Therefore, because 3 minutes is too quick for looking the wording up and because I was nervous, I wrote down all the bullets of my speech. Then, I repeated the speech many many times – removing bullets, rewording, adjusting, etc. – until the talk fits the 3 minutes; below’s reproduced my final hand-written text. Sadly, I’m very poor for learning by heart, so I repeated again and again, with the hope to acquire enough automatic flow delivery. You, the judges!
All in all, I’ve enjoyed the experience! Sure, I’d do differently next time. And yes I hope to run again for other lightning talks. Regular talks’re long drink where you prepare all the ingredients , shake and release while lightning talks’re shooter of one fine spirits distilled with precision. Two flavors of being heady.
Abstract
Bioinformatics and Computational Biology face a twofold challenge. On one hand, the number of steps required to process the amounts of data is becoming larger and larger. And each step implies software involving more and more dependencies. On the other hand, Reproducible Research requires the ability to deeply verify and scrutinize all the processes. And Open Science asks about the ability to reuse, modify or extend.
Workflow might be transparent and reproducible if and only if it’s built on the top of package managers that allow, with the passing of time, to finely control both the set of dependencies and the ability to scrutinize or adapt.
The first story is Guix Workflow Language (GWL): a promise that has not reached its potential. The second story is Concise Common Workflow Language (CCWL): compiling Guile/Scheme workflow descriptions to CWL1 inputs. The third story is Ravanan: a CWL implementation powered by Guix – a transparent and reproducible package manager.
This talk is a threefold short story that makes one: long-term, transparent and reproducible workflow needs first package managers.
(hand-written) Speech
- Bioinformatics and Computational Biology, well all scientific production is about trust: verification, sharing, and so on.
- A mean for this trust is reproducibility. We want reproducible workflow! We want long-term reproducible workflow.
- What is a workflow?
- A workflow is a chain of computational steps to get results from data.
- In real life, it’s more than one linear pipe; it’s a dataflow graph, with concurent steps, multi-inputs and outputs.
- What do we need for long-term reproducible workflow?
- We need to describe:
- the whole computational environment (the binary used by each box),
- and the graph itself.
- Most of the time, the left and right side of this « and » are considered as two orthogonal concerns.
- We need the both sides!
- Reproducible binaries, you get it: Guix!
- And a way to describe such graph taking into account, both:
- the dataflow,
- and the computational environment.
- The question becomes: how to plug the dataflow graph and Guix?
- Wait, what’s Guix?
- Guix deals with long-term reproducible computational environment.
- Check out the many past FOSDEM talks. Autopromo: here and a article2 written with colleagues. Visit this webpage about Guix usage in scientific context.
- Back to the question: how to plug workflow description and Guix?
- The idea is:
- Package managers deal with Directed Acyclic Graph (DAG),
- Workflow engines too!
- Can we build a workflow engine on the top of a package manager?
- It was the idea behind the Guix Workflow Language: implement a workflow domain-specific language (DSL) on the top of Guix (here Guix used as a regular Scheme/Lisp library).
- Adoption? I guess you’ve never heard about it!
- Well, it’s yet another language based on some count-the-closing parenthesis. And as all the examples around show, it’s a high competitive market where each community develops its own…
- GWL adoption? Indeed not adopted, maybe mainly because the idea was too disruptive at the time. Still, the idea sounds!
- The next recycled idea reads:
- It uses Guix for dealing with the computational environment.
- Still some pitfalls, super promising! Join us!
- Time’s up!
- Computational environment must be the first brick for long-term reproducible workflow.
Join the fun, join Guix!
Footnotes:
Common Workflow Language: powered by one specification.
