Missing context about Software Heritage, feel free to give a look to my personal questions/answers from a session of the past year.
What a pleasant meeting with Software Heritage enthusiasts! So refreshing to team up with people with very diverse backgrounds and collaborate altogether for mixing ideas, feedback, or experience.
How to collectively define what Software Heritage community needs to do and to be for this near future. Think about ideas for each then discuss them with your nearest neighbor, one-to-one. Times up, each group of two people must summarize in just one idea for to do and to be. Now, group by four people, and they try to merge the two ideas for each item. Times up, the result needs to sum up in only one idea for to do and to be. Now, group by eight and repeat. Then group again and repeat. The merging strategy leads to consensus including improvements or better wording. The last step of the process reads: we would like the Software Heritage community,
- to be more inclusive to other skill diverse communities beyond computer science;
- to be globally recognized as the reference for managing the software life cycle in all fields.
- to create special interest groups to get together with fixed goals and timeline;
- to continuously increase outreach and usability, open to the world.
Great, isn't it? They appear to me as excellent outcomes1!
Well, time is part of the constraint and for example, with a bit more time, I would have suggested another wording for beyond computer science since I do not identify myself as part of some computer science communities – maybe my proposal could have been: beyond communities focused on computing. Anyway, the interesting outcome is from the collective discussion and not the perfect final wording.
Another personal feedback. My initial “idea” – being consistent with the
Software Heritage community’s recommendations, how many scientists in the room
have archived the source code of their last publication? And mentioned the
swhid identifier? – had been lost in translation in the merging branch I
belonged to, but, and that’s what I find very interesting, the “idea” somehow
appears in the other branch. The outcome is thus a consensus, no?
Then other working groups using other strategies led to other discussions.
For instance, list actions for editorial offices, text for convincing board to
archive software and use
swid identifier, develop different use-cases for
swid identifiers, etc.
All in all, I am very happy to have been to the fruitful event! Thanks again for organizing. And last but not least, that’s also a nice opportunity to meet in-person folks with whom we collaborate online and to just take fresh news of life™.
Edit: The next day, I attended to Software Heritage Symposium and Summit hosted at UNESCO. Thanks again Software Heritage team for the organization, awesome event! I am very grateful to Roberto Di Cosmo. Fun, we were at one seat from each other with Stefano Zacchiroli and we met for the first2 time in-person here while we are currently exchanging emails.
The afternoon was intense! Various panels from diverse backgrounds. It is always interesting to attend to events which bridge institutional with longer-term speeches with day-to-day challenges.
Let point two scientific challenges: graph compression and Large Language Models (LLMs). The graph compression presentation by Sebastiano Vigna made the echo of the presentation by Paolo Ferragina (see slides) done the past year.
About LLMs, the idea is to use of the Software Heritage archive for the training of machine learning models that can automatically generate code to assist with software development tasks. Obviously, it asks legal and ethical questions. That's why: Software Heritage Statement on Large Language Models for Code; « We [Software Heritage] feel that the question is no longer whether LLMs for code should be built. They are already being built, independently of what we do, and there is no turning back. The real question is how they should be built and whom they should benefit.
In alignment with our mission, we believe that LLMs for code should be built in a transparent and respectful way, to the benefit of all. We hence state the following principles for acceptable machine learning use of the Software Heritage archive. »
These 3 principles appears to me worth to read… and share!