COVID-19 Preprints and Their Publishing Rate: An Improved Method

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Context

As the COVID-19 pandemic persists around the world, the scientific community continues to produce and circulate knowledge on the deadly disease at an unprecedented rate. During the early stage of the pandemic, preprints represented nearly 40% of all English-language COVID-19 scientific corpus (6, 000+ preprints | 16, 000+ articles). As of mid-August 2020, that proportion dropped to around 28% (13, 000+ preprints | 49, 000+ articles). Nevertheless, preprint servers remain a key engine in the efficient dissemination of scientific work on this infectious disease. But, giving the ‘uncertified’ nature of the scientific manuscripts curated on preprint repositories, their integration to the global ecosystem of scientific communication is not without creating serious tensions. This is especially the case for biomedical knowledge since the dissemination of bad science can have widespread societal consequences.

Scope

In this paper, I propose a robust method that will allow the repeated monitoring and measuring of COVID-19 preprint’s publication rate. I also introduce a new API called Upload-or-Perish. It is a micro-API service that enables a client to query a specific preprint manuscript’s publication status and associated meta-data using a unique ID. This tool is in active development.

Data

I use Covid-19 Open Research Dataset (CORD-19) to calculate COVID-19 preprint corpus’ conversion rate to peer-reviewed articles. CORD-19 dataset includes preprints from arXiv, bioRxiv, and medRxiv.

Methods

I utilize conditional fuzzy logic on article titles to determine if a preprint has a published counterpart version in the database. My approach is an important departure from previous studies that rely exclusively on bioRxiv API to ascertain preprints’ publication status. This is problematic since the level of false positives in bioRxiv metadata could be as high as 37%.

Findings

My analysis reveals that around 15% of COVID-19 preprint manuscripts in CORD-19 dataset that were uploaded on from arXiv, bioRxiv, and medRxiv between January and early August 2020 were published in a peer-reviewed venue. When compared to the most recent measure available, this represents a two-fold increase in a period of two months. My discussion review and theorize on the potential explanations for COVID-19 preprints’ low conversion rate.

Article activity feed

  1. SciScore for 10.1101/2020.09.04.20188771: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Limitations & Future Directions: In this article, the main goal was to measure the proportion of COVID-19 preprints that were published in peer-reviewed journals between January and August 2020. I followed with a discussion, although perhaps speculative, on some reasons that can help explain the low publication rate of COVID-19 preprints. More notably, I coined the concept of “performative assent” to articulate how a sets of relational metrics – view, download, citation, peer-comment, altmetric – can, in the age of social media, help construct scientific legitimacy (or the illusion of) in a loosely-defined post-publication peer-review process. I also point that in a context where multi-level cooperation (civil society, government, private sector, science) is paramount to fight the pandemic, scientific competition remains well and alive. In other words, redundancy happens as researchers fight for the limited prime space in top academic journals. I encourage students of preprint culture and knowledge production to use social survey to start documenting the experience and decision-making of COVID-19 researchers who are users of preprints’ digital repositories. Questions pertaining to their experiences of submission to peer-reviewed venues, rejection, and revision—the whole cycle—can help illuminate emerging practices and tactics around preprints. Of particular interest is researchers’ thinking (and justification) around the decision of not submitting their manuscript to peer-rev...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.