Gender Balance and Readability of COVID-19 Scientific Publishing: A Quantitative Analysis of 90,000 Preprint Manuscripts

This article has been Reviewed by the following groups

Read the full article

Abstract

Releasing preprints is a popular way to hasten the speed of research but may carry hidden risks for public discourse. The COVID-19 pandemic caused by the novel SARS-CoV-2 infection highlighted the risk of rushing the publication of unvalidated findings, leading to damaging scientific miscommunication in the most extreme scenarios. Several high-profile preprints, later found to be deeply flawed, have indeed exacerbated widespread skepticism about the risks of the COVID-19 disease – at great cost to public health. Here, preprint article quality during the pandemic is examined by distinguishing papers related to COVID-19 from other research studies. Importantly, our analysis also investigated possible factors contributing to manuscript quality by assessing the relationship between preprint quality and gender balance in authorship within each research discipline. Using a comprehensive data set of preprint articles from medRxiv and bioRxiv from January to May 2020, we construct both a new index of manuscript quality including length, readability, and spelling correctness and a measure of gender mix among a manuscript’s authors. We find that papers related to COVID-19 are less well-written than unrelated papers, but that this gap is significantly mitigated by teams with better gender balance, even when controlling for variation by research discipline. Beyond contributing to a systematic evaluation of scientific publishing and dissemination, our results have broader implications for gender and representation as the pandemic has led female researchers to bear more responsibility for childcare under lockdown, inducing additional stress and causing disproportionate harm to women in science.

Article activity feed

  1. SciScore for 10.1101/2021.06.14.21258917: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Ethicsnot detected.
    Sex as a biological variableGiven a first name input, the API would then provide the empirical probability of a male or female gender.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Initially, the 91,056 unique preprint documents published to medRxiv (6.9%, n=6,291) or bioRxiv (93.1%, n=84,644) between January 1, 2020 and June 3, 2020 were compiled using the ROpenSci library.
    ROpenSci
    suggested: (rOpenSci, RRID:SCR_013986)
    Using the requests library from Python, the corresponding PDF files were downloaded.
    Python
    suggested: (IPython, RRID:SCR_001658)
    These were then parsed using the PdfFileReader function from the PyPDF2 Python package to extract text from the preprint articles.
    PyPDF2
    suggested: None
    Furthermore, we extracted the subject area of each medRxiv and bioRxiv preprint using the “Subject Area” field displayed on the preprint server’s landing page (e.g., “Infectious Diseases (except HIV/AIDS)” for medRxiv, “Scientific Communication and Education” for bioRxiv).
    bioRxiv
    suggested: (bioRxiv, RRID:SCR_003933)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.