Experimenting with reproducibility in bioinformatics

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Reproducibility has been shown to be limited in many scientific fields. This question is a fundamental tenet of the scientific activity, but the related issues of reusability of scientific data are poorly documented. Here, we present a case study of our attempt to reproduce a promising bioinformatics method [1] and illustrate the challenges to use a published method for which code and data were available. First, we tried to re-run the analysis with the code and data provided by the authors. Second, we reimplemented the method in Python to avoid dependency on a MATLAB licence and ease the execution of the code on HPCC (High-Performance Computing Cluster). Third, we assessed reusability of our reimplementation and the quality of our documentation. Then, we experimented with our own software and tested how easy it would be to start from our implementation to reproduce the results, hence attempting to estimate the robustness of the reproducibility. Finally, in a second part, we propose solutions from this case study and other observations to improve reproducibility and research efficiency at the individual and collective level.

Availability

last version of StratiPy (Python) with two examples of reproducibility are available at GitHub [2].

Contact

yang-min.kim@pasteur.fr

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giy077

    Yang-Min Kim 1Institut Pasteur, Human Genetics and Cognitive Functions Unit, Paris, France,2CNRS UMR 3571 Genes, Synapses and Cognition, Institut Pasteur, Paris, France,3University Paris Diderot, Sorbonne Paris Cité, Paris, France,4Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur and CNRS), Paris, France,Find this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Yang-Min KimJean-Baptiste Poline 5Henry H. Wheeler Jr. Brain Imaging Center, Helen Wills Neuroscience Institute, University of California, Berkeley, California, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Jean-Baptiste PolineGuillaume Dumas 1Institut Pasteur, Human Genetics and Cognitive Functions Unit, Paris, France,2CNRS UMR 3571 Genes, Synapses and Cognition, Institut Pasteur, Paris, France,3University Paris Diderot, Sorbonne Paris Cité, Paris, France,4Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur and CNRS), Paris, France,Find this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Guillaume Dumas

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giy077 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.101237 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.101238