Phylogenetic Analysis Of SARS-CoV-2 In The First Months Since Its Emergence

This article has been Reviewed by the following groups

Read the full article

Abstract

During the first months of SARS-CoV-2 evolution in a new host, contrasting hypotheses have been proposed about the way the virus has evolved and diversified worldwide. The aim of this study was to perform a comprehensive evolutionary analysis to describe the human outbreak and the evolutionary rate of different genomic regions of SARS-CoV-2.

The molecular evolution in nine genomic regions of SARS-CoV-2 was analyzed using three different approaches: phylogenetic signal assessment, emergence of amino acid substitutions, and Bayesian evolutionary rate estimation in eight successive fortnights since the virus emergence.

All observed phylogenetic signals were very low and trees topologies were in agreement with those signals. However, after four months of evolution, it was possible to identify regions revealing an incipient viral lineages formation despite the low phylogenetic signal, since fortnight 3. Finally, the SARS-CoV-2 evolutionary rate for regions nsp3 and S, the ones presenting greater variability, was estimated to values of 1.37 × 10 −3 and 2.19 × 10 −3 substitution/site/year, respectively.

In conclusion, results obtained in this work about the variable diversity of crucial viral regions and the determination of the evolutionary rate are consequently decisive to understand essential feature of viral emergence. In turn, findings may allow characterizing for the first time, the evolutionary rate of S protein that is crucial for vaccines development.

Article activity feed

  1. SciScore for 10.1101/2020.07.21.212860: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    After elimination of sequences with indeterminate or ambiguous positions, the number of analyzed sequences for each region were: nsp1, 1608; nsp3, 1511; nsp14, 1550; S, 1488; Orf3a, 1600; E, 1615; Orf6, 1616; Orf8, 1612; and N, 1610.
    nsp1
    suggested: None
    Software and Algorithms
    SentencesResources
    Complete genomes were aligned using MAFFT against the Wuhan-Hu-1 reference genome (NC_045512.2, EPI_ISL_402125).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The best-fit evolutionary model to each dataset was selected based on the Bayesian Information Criterion obtained with the JModelTest v2.1.10 software [18].
    JModelTest
    suggested: (jModelTest, RRID:SCR_015244)
    Phylogenetic trees were constructed using Bayesian inference with MrBayes v3.2.7a [19].
    MrBayes
    suggested: (MrBayes, RRID:SCR_012067)
    Phylogenetic trees were visualized with FigTree v1.4.4.
    FigTree
    suggested: (FigTree, RRID:SCR_008515)
    Evolutionary rate: The estimation of the nucleotide evolutionary rate was made with the Beast v1.10.4 program package [21].
    Beast
    suggested: (BEAST, RRID:SCR_010228)
    The convergence of the “meanRate” parameters [effective sample size (ESS) ≥ 200, burn-in 10%] was verified with Tracer v1.7.1 [20].
    Tracer
    suggested: (Tracer, RRID:SCR_019121)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Despite limitations of the evolutionary study of an emerging virus, where the selection pressures are still low and therefore its variability is also low, this work has a great strength: it lies on the extremely careful selection of a big sequence dataset to be analyze. First, it was considered selected sequences to have a good temporal signal and spatial (geographic) structure. Secondly, much attention was paid to the elimination of sequences with low coverage and indeterminacies that could generate a noise for the phylogenetic analysis of a virus that is beginning to evolve in a new host. The appearance of a new virus means an adaptation challenge. The SARS-CoV-2 overcome the spill stage and shows a significantly higher spread than SARS-CoV and MERS-CoV, thus becoming itself the most important pandemic of the century. In this context, the results obtained in this work about the variable diversity of nine crucial viral regions and the determination of the evolutionary rate, are consequently decisive to understanding essential feature of viral emergence. Nevertheless, monitoring SARS-CoV-2 population will be required to determine the evolutionary course of new mutations as well as to understand the way they affect viral fitness in human hosts.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.