Analytical code sharing practices in biomedical research
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (PREreview)
- ASAPbio Meta-Research Crowd PREreviews (prereview)
Abstract
Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten papers organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability (p=2.71×10 −9 ). Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses (p=1.15*10 −07 ). In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.
Article activity feed
-
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/8380096.
This review reflects comments and contributions from Jessica Polka and Stephen Gabrielson. Review synthesized by Stephen Gabrielson.
This study analyzed the code sharing rates from articles published between 2016 – 2021 from a selection of 8 biomedical journals. From their analysis, the authors found that approximately half of the manuscripts did not share associated code and if the paper did, most did not present the code in an organized way to make it easily reproducible. The authors conclude by presenting five principles to help increase the rates of code sharing and promote open science practices.
Major comments
-
The paper presents "five principles to increase code availability and …
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/8380096.
This review reflects comments and contributions from Jessica Polka and Stephen Gabrielson. Review synthesized by Stephen Gabrielson.
This study analyzed the code sharing rates from articles published between 2016 – 2021 from a selection of 8 biomedical journals. From their analysis, the authors found that approximately half of the manuscripts did not share associated code and if the paper did, most did not present the code in an organized way to make it easily reproducible. The authors conclude by presenting five principles to help increase the rates of code sharing and promote open science practices.
Major comments
-
The paper presents "five principles to increase code availability and archival stability" but only four are listed.
Minor comments
-
In the sentence "We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021" in the abstract, it would be helpful to indicate how many of these manuscripts utilized code.
-
How were the journals for this study chosen?
-
"Data sharing is a more prevalent practice than code sharing in biomedical research." It may be difficult to draw this conclusion from a sample of 8 journals within the same field. Similarly, acknowledging the caveat of a limited sample size of journals throughout the manuscript (eg when mentioning the "most commonly used repository" etc) could help provide context to readers.
-
"This suggests that including a data availability statement in a manuscript can improve access to shared data." Because correlation may not indicate causality hare, this statement could be softened.
-
While journals are called out in one of the principles for increasing code availability, I wonder if funders should also be explicitly called out to require investigators share their code and other relevant outputs.
-
Principle number three mentions a "badge system" for encouraging researchers to share their code. Including a mention here of the Open Science Badges from the Center for Open Science would be helpful: https://www.cos.io/initiatives/badges
-
I agree with principle four that training can be a great way to advocate and educate researchers for sharing their code and other open science outputs. I would also add that librarians are often a great resource to help provide training and guidance for publishing and open science topics.
Comments on reporting
-
"Out of the 453 manuscripts analyzed, 43% contained code availability statements, and 85% of those manuscripts shared code." Does this mean of the manuscripts with code availability statements, or of the 453?
Competing interests
The author declares that they have no competing interests.
-
-
-
-