Introducing the inaugural Birdaro training program cohort: Part 4
At the end of last month we launched the pilot cohort of the Birdaro training program for open-source leaders, which will run for 12 weeks until mid-December 2025.
Thanks to strong interest in the program, we have put together a cohort that represents a variety of focus areas, fiscal homes, project stages and project sizes. You can read more about how we intentionally built this cohort of participants, and used their input to iteratively shape the pilot curriculum in an earlier blog post.
In this series of five blog posts, we’re introducing you to the teams taking part in the Birdaro 2025 pilot cohort. In this post, we’re featuring the following open-source software tools for STEM research:
- The R Project – A free software environment for statistical computing and graphics
- RSpace – A platform that enables FAIR (Findable, Accessible, Interoperable, Reusable) data workflows across the entire research lifecycle
- AsyncAPI – An open-source specification for event driven architecture and tooling for asynchronous APIs
- OpenRefine – A desktop application for data cleanup and transformation to other formats
- Scikit-learn – Tools for predictive data analysis and machine learning
You can read more about each of these projects below, and visit this page of the Birdaro website to learn more about individual team members.

Project bios
Project bios are based on information provided by the project teams during the application and registration process.
The R Project
R is a language and open-source environment that supports the full data science pipeline from data management, through state-of-the-art statistical and visualisation methods to the creation of data-based products for end users, such as reports and web applications. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. Despite a large userbase, the R Project faces issues of sustainability, especially when it comes to engaging volunteer contributors.
RSpace
RSpace solves the critical problem of fragmented research data management by providing an open-source platform that enables FAIR (Findable, Accessible, Interoperable, Reusable) data workflows across the entire research lifecycle, from planning to publication. It helps researchers and institutions eliminate data silos, improve reproducibility, and seamlessly integrate with existing tools while maintaining comprehensive audit trails and collaborative features essential for modern scientific research. RSpace went from closed-source to fully open-source in June 2024 after being developed for almost two decades by ResearchSpace, a relatively rare transition that requires careful community engagement – including creating intentional governance and decision-making structures.
AsyncAPI
AsyncAPI is an open source initiative that seeks to improve the current state of Event-Driven Architectures (EDA). AsyncAPI’s long-term goal is to make working with EDAs as easy as working with REST APIs. That goes from documentation to code generation, from discovery to event management, and beyond. Currently, the AsyncAPI team is leading initiatives to train junior contributors to become maintainers through the AsyncAPI Maintainership Program, a key part of their developing community-engagement strategy.
OpenRefine
OpenRefine is an open-source desktop application for data cleanup and transformation to other formats, an activity commonly known as data wrangling. OpenRefine’s goal is to empower everyone to meaningfully engage with data by providing an accessible open source tool and nurturing a diverse, supportive community. Top of mind for OpenRefine’s leadership right now is financial sustainability – with one long-term grant ending, what does it mean to maintain the project on smaller but more frequent injections of funding?
Scikit-learn
The goal of scikit-learn is to make accurate predictive analysis and machine learning as easy as possible. The project provides simple and efficient tools for predictive data analysis that are accessible to everybody and reusable in various contexts. As a project operating at the interface of generative AI and open-source, the team is currently developing strategies to prevent low-effort contributions from LLM-generated pull requests, which impact the ability of their community of volunteers to participate.
Additional information
- You can also explore individual participant bios, on the Birdaro website.
- You can find out more about the curriculum we’ve developed to support the pilot cohort participants’ shared interest in governance and documentation here.
Check out the other participating projects
Post 1: OpenWellness, movement, Open Source with SLU, and the Community Software Facility (NCAR)
Post 5: CIB Mango Tree, MNE-Python, ArviZ, icepyx, and OpenMRS