The SORTEE Guidelines for Data and Code Quality Control in Ecology and Evolutionary Biology

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

Open data and code are crucial to increasing transparency and reproducibility, and in building trust in scientific research. However, despite an increasing number of journals in ecology and evolutionary biology mandating for data and code to be archived alongside published articles, the amount and quality of archived data and code, and subsequent reproducibility of results, has remained worryingly low. As a result, a handful of journals have recruited dedicated data editors, whose role is to help authors increase the overall quality of archived data and code. There is, however, a general lack of consensus of what a data editor should check, how to do it, and to what level of detail, and the process is often vague and hidden from readers and authors alike. Here, with the input from multiple data editors across several journals in ecology and evolutionary biology, we establish and describe the first standardised guidelines for Data and Code Quality Control on behalf of the Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology (SORTEE). We start by introducing the concept of a data editor and data and code quality control, what is expected from data and code quality control, the relative costs and benefits to journals, authors, and readers, and then introduce and detail the SORTEE-led guidelines, ending with advice for journals and authors. We believe that by adopting these standardised guidelines, journals will help increase the consistency and transparency of the data editor process for readers, authors, and data editors.

Article activity feed

  1. The scientific publishing system is in constant change. Born more than two hundred years ago to share the latest advances in science, scientific manuscripts soon became the standard to advertise new discoveries. I stress advertise, because the research itself is rarely fit for the written format, and scientific manuscripts often show the highlights and main conclusions of an arduous process involving gathering data and running analysis, but the classic paper format only allows for the final graphs to be shown. In fact, the complexity of evaluating if a scientific manuscript is sound has recently triggered the role of reviewers. I said recently, because despite earlier examples, peer review was not generally adopted until the 1960s, and influential journals such as Nature only implemented it in 1973. Nowadays, reviewers play a central role in assessing editors about the correctness of a manuscript, but can we do better?

    Of course, we can. With the advent of information technologies, at least part of the research process can now reclaim its central place in the publication process. Data gathered can be easily made available, and all the steps and decisions taken to analyze it and extract conclusions can be openly shared. We can finally evaluate the process, and not only the outcomes of research. In accordance, many journals and authors are pushing for the publication of data, and in recent years, also the code to analyze this data. The advantages are clear, enhancing not only transparency and reproducibility, but also data and code re-use. 

    This new publication paradigm requires journals to adapt, and a new figure is emerging as the data and code editors. As with all new paradigms, nowadays a variety of approaches and initiatives coexist, and I applaud the effort of SORTEE members to gather a wide reprsentation of data and code editors in ecology and evolution to provide clear guidelines. Pick et al. (2026) present a detailed document explaining six steps to ensure data and code reproducibility. These six steps go from basic requirements to implementing a gold standard of data and code reproducibility, allowing for some flexibility, while the new norms emerge in our field. 

    I want to highlight two critical points. First, is how they clearly articulate the role of data editors in data and code quality control, which is not about verifying the actual data and code correctedness, but rather ensuring that data and code are available, code runs properly and data is in the appropriate format and has the corresponding metadata to be scrutinised. Second, the manuscript is illustrative because it clearly articulates "why" is worth adopting this best practices, and it is not just limited to explain "what" should be done. In doing so, it sets clear expectations for editors, but also for jouranls and authors. 

    While many answers still remain, such as how to reward the time-consuming role of being a data editor, or how the increasing generalized use of Large Lenguage Models might interplay with these new roles, this paper constitutes a strong manifesto on why adhering to high standards will benefit everyone. 

    References

    Joel L. Pick, Bethany Allen, Benedicte Bachelot, Kevin Bairos-Novak, Jack Brand, Barbara Class, Tad Dallas, Pietro D'Amelio, Erola Fenollosa, Esteban Fernández-Juricic, Dylan Gomes, Matthew Grainger, Thomas Guillemaud, Christian John, Ruby Krasnow, Malgorzata Lagisz, Sebastian Lequime, Daniel Maynard, Shinichi Nakagawa, Rose O'Dea, Matthieu Paquet, Quentin Petitjean, Alfredo Sánchez-Tójar, Natalie van Dis, Laura Wilson, Edward R. Ivimey-Cook (2026) The SORTEE Guidelines for Data and Code Quality Control in Ecology and Evolutionary Biology. EcoEvoRxiv, ver.3 peer-reviewed and recommended by PCI Ecology https://doi.org/10.32942/X24P8S