Improving Reproducibility of Data Analysis and Code in Medical Research - 5 Recommendations to Get Started

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Due to the growing use of high dimensional data and methodological advances in medical research, reproducibility of research is increasingly dependent on the availability of reproducible code. However, code is rarely made available and too often only partly reproducible. Here, we aim to provide practical and easily implementable recommendations for medical researchers to improve reproducibility of their code. We reviewed current coding practices in the population-based Rotterdam Study cohort. Based on this review, we formulated the following five recommendations to improve the reproducibility of code used in data analysis: (1) Make reproducibility a priority and allocate time and resources; (2) Implement systematic code review by peers, as it further strengthens reproducibility. We provide a code review checklist, which serves as a practical tool to facilitate structured code review. (3) Write comprehensible code that is well-structured; (4) Report decisions transparently, for instance by providing the annotated workflow code for data cleaning, formatting, and sample selection; (5) Focus on accessibility of code and data and share both, when possible, via an open repository to foster accessibility. Ideally, this repository should be managed by the institution and should be accessible to everyone. Based on these five recommendations, medical researchers can take actionable steps to improve the reproducibility of their research. Importantly, these recommendations are thought to provide a practical starting point for enhancing reproducibility rather than mandatory guidelines.

Article activity feed