Phylogenetic Analysis Of SARS-CoV-2 In The First Months Since Its Emergence
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
During the first months of SARS-CoV-2 evolution in a new host, contrasting hypotheses have been proposed about the way the virus has evolved and diversified worldwide. The aim of this study was to perform a comprehensive evolutionary analysis to describe the human outbreak and the evolutionary rate of different genomic regions of SARS-CoV-2.
The molecular evolution in nine genomic regions of SARS-CoV-2 was analyzed using three different approaches: phylogenetic signal assessment, emergence of amino acid substitutions, and Bayesian evolutionary rate estimation in eight successive fortnights since the virus emergence.
All observed phylogenetic signals were very low and trees topologies were in agreement with those signals. However, after four months of evolution, it was possible to identify regions revealing an incipient viral lineages formation despite the low phylogenetic signal, since fortnight 3. Finally, the SARS-CoV-2 evolutionary rate for regions nsp3 and S, the ones presenting greater variability, was estimated to values of 1.37 × 10 −3 and 2.19 × 10 −3 substitution/site/year, respectively.
In conclusion, results obtained in this work about the variable diversity of crucial viral regions and the determination of the evolutionary rate are consequently decisive to understand essential feature of viral emergence. In turn, findings may allow characterizing for the first time, the evolutionary rate of S protein that is crucial for vaccines development.
Article activity feed
-
-
SciScore for 10.1101/2020.07.21.212860: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Experimental Models: Organisms/Strains Sentences Resources After elimination of sequences with indeterminate or ambiguous positions, the number of analyzed sequences for each region were: nsp1, 1608; nsp3, 1511; nsp14, 1550; S, 1488; Orf3a, 1600; E, 1615; Orf6, 1616; Orf8, 1612; and N, 1610. nsp1suggested: NoneSoftware and Algorithms Sentences Resources Complete genomes were aligned using MAFFT against the Wuhan-Hu-1 reference genome (NC_045512.2, EPI_ISL_402125). MAFFTsuggested: (MAFFT, RRID:SCR_011811)The best-fit evolutionary model to each dataset was selected based on the Bayesian Information Criterion obtained with the JModelTest … SciScore for 10.1101/2020.07.21.212860: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Experimental Models: Organisms/Strains Sentences Resources After elimination of sequences with indeterminate or ambiguous positions, the number of analyzed sequences for each region were: nsp1, 1608; nsp3, 1511; nsp14, 1550; S, 1488; Orf3a, 1600; E, 1615; Orf6, 1616; Orf8, 1612; and N, 1610. nsp1suggested: NoneSoftware and Algorithms Sentences Resources Complete genomes were aligned using MAFFT against the Wuhan-Hu-1 reference genome (NC_045512.2, EPI_ISL_402125). MAFFTsuggested: (MAFFT, RRID:SCR_011811)The best-fit evolutionary model to each dataset was selected based on the Bayesian Information Criterion obtained with the JModelTest v2.1.10 software [18]. JModelTestsuggested: (jModelTest, RRID:SCR_015244)Phylogenetic trees were constructed using Bayesian inference with MrBayes v3.2.7a [19]. MrBayessuggested: (MrBayes, RRID:SCR_012067)Phylogenetic trees were visualized with FigTree v1.4.4. FigTreesuggested: (FigTree, RRID:SCR_008515)Evolutionary rate: The estimation of the nucleotide evolutionary rate was made with the Beast v1.10.4 program package [21]. Beastsuggested: (BEAST, RRID:SCR_010228)The convergence of the “meanRate” parameters [effective sample size (ESS) ≥ 200, burn-in 10%] was verified with Tracer v1.7.1 [20]. Tracersuggested: (Tracer, RRID:SCR_019121)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Despite limitations of the evolutionary study of an emerging virus, where the selection pressures are still low and therefore its variability is also low, this work has a great strength: it lies on the extremely careful selection of a big sequence dataset to be analyze. First, it was considered selected sequences to have a good temporal signal and spatial (geographic) structure. Secondly, much attention was paid to the elimination of sequences with low coverage and indeterminacies that could generate a noise for the phylogenetic analysis of a virus that is beginning to evolve in a new host. The appearance of a new virus means an adaptation challenge. The SARS-CoV-2 overcome the spill stage and shows a significantly higher spread than SARS-CoV and MERS-CoV, thus becoming itself the most important pandemic of the century. In this context, the results obtained in this work about the variable diversity of nine crucial viral regions and the determination of the evolutionary rate, are consequently decisive to understanding essential feature of viral emergence. Nevertheless, monitoring SARS-CoV-2 population will be required to determine the evolutionary course of new mutations as well as to understand the way they affect viral fitness in human hosts.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
-