Project focus
Research should be reproducible. The same analysis of the same data should produce the same result. This may sound obvious at first, but many studies show that a large portion of scientific literature across disciplines is not reproducible. Research data and analysis codes are often not readily shared, leaving other researchers virtually no way to independently verify published scientific results. In some cases, even the scientists who conducted the research are unable to reproduce their own findings, and this not long after the work is published. The public’s investment in a particular research study is often limited to a—often paid—PDF article on a for-profit publisher’s website.
Conducting reproducible research is hampered by at least two factors: First, the incentives in academic research are not conducive, as the quantity of publications and the impact factor of the journals in which they are published often still take more account in hiring and promotion decisions than the quality, robustness, and reproducibility of the work. Second, making research reproducible is actually difficult, and prospective researchers often lack targeted and comprehensive training in relevant methods and tools.
Reproducing the same results on a different computer is often not a trivial problem. For example, not only must all code and data be available in an accessible format, but the same software (or computing environment) must also be replicable. In general, managing research results reproducibly and transparently throughout their entire lifecycle—from initial idea to publication and beyond—represents a significant challenge. Fortunately, researchers can learn from practices and tools from other disciplines, particularly software engineering, that have significantly professionalized collaborative work on digital objects such as code and data. These include, among others, tracking changes to digital objects using version control systems such as Git, best practices for code and data management, and creating stable and transportable computing environments using software containers such as Docker.
This course provided an introduction to the tools and procedures that enable young scientists to make their research reproducible.
Review and results
Following the successful approach of our previous course on “Version Control of Code and Data with Git,” we focused on developing an online learning resource titled “The Repro Book” (available here), modeled after our “Version Control Book.” This online guide was adapted to the structure of the seminar and served as the course textbook.
The main outcome of the teaching project was the opportunity for course participants to learn about reproducible work in a scientific context, particularly through its practical application on a small, fictitious research project. Through practical exercises and exposure to concepts such as metadata, community guidelines, renv, good coding practices, and literate programming using the software Quarto, R, and RStudio, they gained in-depth knowledge of how to set up a research project that is easily reproducible.
Tips from lecturers for lecturers
A central element of digital skills development was ensuring the transparent, public, and collaborative development of teaching and learning resources using Git and GitHub, as well as the Open Science Framework, within the available time. Individual teaching content was applied to the creation of the teaching content. This approach follows sound scientific principles of transparency, openness, and reproducibility.
The teaching project also contributed to the development of lecturers’ skills in the area of didactic skills. Various didactic methods were tested within the teaching project: from pair work and individual work, to quizzes, demonstrations, and exercises. A conscious focus was always to create plenty of time and space for implementation, where participants could apply what they had learned using concrete exercises and examples.