The origin and underlying driving forces of the SARS-CoV-2 outbreak

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

SARS-CoV-2 began spreading in December 2019 and has since become a pandemic that has impacted many aspects of human society. Several issues concerning the origin, time of introduction to humans, evolutionary patterns, and underlying force driving the SARS-CoV-2 outbreak remain unclear.

Method

Genetic variation in 137 SARS-CoV-2 genomes and related coronaviruses as of 2/23/2020 was analyzed.

Result

After correcting for mutational bias, the excess of low frequency mutations on both synonymous and nonsynonymous sites was revealed which is consistent with the recent outbreak of the virus. In contrast to adaptive evolution previously reported for SARS-CoV during its brief epidemic in 2003, our analysis of SARS-CoV-2 genomes shows signs of relaxation. The sequence similarity in the spike receptor binding domain between SARS-CoV-2 and a sequence from pangolin is probably due to an ancient intergenomic introgression that occurred approximately 40 years ago. The current outbreak of SARS-CoV-2 was estimated to have originated on 12/11/2019 (95% HPD 11/13/2019–12/23/2019). The effective population size of the virus showed an approximately 20-fold increase from the onset of the outbreak to the lockdown of Wuhan (1/23/2020) and ceased to increase afterwards, demonstrating the effectiveness of social distancing in preventing its spread. Two mutations, 84S in orf8 protein and 251 V in orf3 protein, occurred coincidentally with human intervention. The former first appeared on 1/5/2020 and plateaued around 1/23/2020. The latter rapidly increased in frequency after 1/23/2020. Thus, the roles of these mutations on infectivity need to be elucidated. Genetic diversity of SARS-CoV-2 collected from China is two times higher than those derived from the rest of the world. A network analysis found that haplotypes collected from Wuhan were interior and had more mutational connections, both of which are consistent with the observation that the SARS-CoV-2 outbreak originated in China.

Conclusion

SARS-CoV-2 might have cryptically circulated within humans for years before being discovered. Data from the early outbreak and hospital archives are needed to trace its evolutionary path and determine the critical steps required for effective spreading.

Article activity feed

  1. SciScore for 10.1101/2020.04.12.038554: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Sequence analyses and phylogeny construction: CDSs were aligned based on translated amino acid sequences using MUSCLE v3.8.31 [40], and back-translated to their corresponding DNA sequences using TRANALIGN software from the EMBOSS package (http://emboss.open-bio.org/) [41].
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    EMBOSS
    suggested: (EMBOSS, RRID:SCR_008493)
    Number of nonsynonymous changes per nonsynonymous site (dN) and synonymous changes per synonymous site (dS) among genomes were estimated based Li-Wu-Luo’s method [45] implemented in MEGA-X and PAML 4 [46].
    PAML
    suggested: (PAML, RRID:SCR_014932)
    The RDP file for the haplotype network analyses was generated using DnaSP 6.0 [47] and input into Network 10 (https://www.fluxus-engineering.com/) to construct the haplotype network using the median joining algorithm.
    https://www.fluxus-engineering.com/
    suggested: (Fluxus Engineering, RRID:SCR_008618)
    Four haplotype test implemented in DnaSp was applied to test for possible recombination event.
    DnaSp
    suggested: (DnaSP, RRID:SCR_003067)
    The mutation rate of SARS-CoV-2 and the time to the most recent common ancestor (TMRCA) of virus isolates were estimated by an established Bayesian MCMC approach implemented in BEAST version 1.10.4 [48].
    BEAST
    suggested: (BEAST, RRID:SCR_010228)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.