Comprehensive evolution and molecular characteristics of a large number of SARS-CoV-2 genomes revealed its epidemic trend and possible origins

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Objectives

To reveal epidemic trend and possible origins of SARS-CoV-2 by exploring its evolution and molecular characteristics based on a large number of genomes since it has infected millions of people and spread quickly all over the world.

Methods

Various evolution analysis methods were employed.

Results

The estimated Ka/Ks ratio of SARS-CoV-2 is 1.008 or 1.094 based on 622 or 3624 SARS-CoV-2 genomes, and the time to the most recent common ancestor (tMRCA) was inferred in late September 2019. Further 9 key specific sites of highly linkage and four major haplotypes H1, H2, H3 and H4 were found. The Ka/Ks, detected population size and development trends of each major haplotype showed H3 and H4 subgroups were going through a purify evolution and almost disappeared after detection, indicating H3 and H4 might have existed for a long time, while H1 and H2 subgroups were going through a near neutral or neutral evolution and globally increased with time. Notably the frequency of H1 was generally high in Europe and correlated to death rate (r>0.37).

Conclusions

In this study, the evolution and molecular characteristics of more than 16000 genomic sequences provided a new perspective for revealing epidemiology of SARS-CoV-2.

Article activity feed

  1. SciScore for 10.1101/2020.04.24.058933: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Finally, 624 high quality genomes with precise collection time were selected and aligned using MAFFT v7 with automatic parameters.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Estimate of evolution rate and the time to the most recent common ancestor for SARS-CoV, MERS-CoV, and SARS-CoV-2: The average Ka, Ks and Ka/Ks for all coding sequences were calculated using KaKs_Calculator v1.2(Zhang, et al., 2006), and the substitution rate and tMRCA were estimated using BEAST v2.6.2(Bouckaert, et al., 2019).
    KaKs_Calculator
    suggested: None
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    The temporal signal with root-to-tip divergence was visualized in TempEst v1.5.3(Rambaut, et al., 2016) using a ML whole genome tree with bootstrap value as input.
    TempEst
    suggested: (TempEst, RRID:SCR_017304)
    The output was examined in Tracer v1.6 (http://tree.bio.ed.ac.uk/software/tracer/).
    Tracer
    suggested: (Tracer, RRID:SCR_019121)
    Variants calling of SARS-CoV-2 genome sequences: Each genome sequence was aligned to the reference genome (NC_045512.2) using bowtie2 with default parameters(Langmead and Salzberg, 2012), and variants were called by samtools (sort; mpileup -gf) and bcftoots (call -vm).
    bowtie2
    suggested: (Bowtie 2, RRID:SCR_016368)
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    The SMS method was used to select GTR+G as the base substitution model(Lefort, et al., 2017), and the PhyML 3.1(Guindon, et al., 2010) and MEGA(Kumar, et al., 2018) were used to construct the no-root phylogenetic tree by the maximum likelihood method with the bootstrap value of 100.
    PhyML
    suggested: (PhyML, RRID:SCR_014629)
    Phylogenetic network of haplotype subgroups: The phylogenetic networks were inferred by PopART package v1.7.2(Leigh, et al., 2015) using TCS and minimum spanning network (MSN) methods respectively.
    PopART
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.