Genome-wide data inferring the evolution and population demography of the novel pneumonia coronavirus (SARS-CoV-2)

This article has been Reviewed by the following groups

Read the full article

Abstract

As the highly risk and infectious diseases, the outbreak of coronavirus disease 2019 (COVID-19) poses unprecedent challenges to global health. Up to March 3, 2020, SARS-CoV-2 has infected more than 89,000 people in China and other 66 countries across six continents. In this study, we used 10 new sequenced genomes of SARS-CoV-2 and combined 136 genomes from GISAID database to investigate the genetic variation and population demography through different analysis approaches (e.g. Network, EBSP, Mismatch, and neutrality tests). The results showed that 80 haplotypes had 183 substitution sites, including 27 parsimony-informative and 156 singletons. Sliding window analyses of genetic diversity suggested a certain mutations abundance in the genomes of SARS-CoV-2, which may be explaining the existing widespread. Phylogenetic analysis showed that, compared with the coronavirus carried by pangolins (Pangolin-CoV), the virus carried by bats (bat-RaTG13-CoV) has a closer relationship with SARS-CoV-2. The network results showed that SARS-CoV-2 had diverse haplotypes around the world by February 11. Additionally, 16 genomes, collected from Huanan seafood market assigned to 10 haplotypes, indicated a circulating infection within the market in a short term. The EBSP results showed that the first estimated expansion date of SARS-CoV-2 began from 7 December 2019.

Article activity feed

  1. SciScore for 10.1101/2020.03.04.976662: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Miseq v2 reagent kit, and High output Reagent Cartridge (illumina, San Diego, CA, USA) were used for deep sequencing in Iseq100, Miseq, and Miniseq platforms (illumina, San Diego, CA, USA), respectively.
    Miseq
    suggested: (A5-miseq, RRID:SCR_012148)
    Miniseq
    suggested: None
    Phylogenetic analyses of the genomes were done with RAxML v.
    RAxML
    suggested: (RAxML, RRID:SCR_006086)
    The tree was visualized with FigTree v.
    FigTree
    suggested: (FigTree, RRID:SCR_008515)
    The 136 complete genome sequences of SARS-CoV-2 and an outgroup (bat-RaTG13-CoV) were aligned using MAFFT then the alignment was manually checked using Geneious.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Geneious
    suggested: (Geneious, RRID:SCR_010519)
    Population genetic indices in was estimated in DnaSP, including nucleotide diversity (π) and haplotype diversity (Hd).
    DnaSP
    suggested: (DnaSP, RRID:SCR_003067)
    The mismatch distribution was estimated based on the genomes of SARS-CoV-2 in Arlequin to test the hypothesis of recent population growth.
    Arlequin
    suggested: (ARLEQUIN, RRID:SCR_009051)
    The divergence times were estimated in BEAST using a Bayesian Markov chain Monte Carlo (MCMC) method with a strict clock.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.