Comprehensive evolution and molecular characteristics of a large number of SARS-CoV-2 genomes revealed its epidemic trend and possible origins
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Objectives
To reveal epidemic trend and possible origins of SARS-CoV-2 by exploring its evolution and molecular characteristics based on a large number of genomes since it has infected millions of people and spread quickly all over the world.
Methods
Various evolution analysis methods were employed.
Results
The estimated Ka/Ks ratio of SARS-CoV-2 is 1.008 or 1.094 based on 622 or 3624 SARS-CoV-2 genomes, and the time to the most recent common ancestor (tMRCA) was inferred in late September 2019. Further 9 key specific sites of highly linkage and four major haplotypes H1, H2, H3 and H4 were found. The Ka/Ks, detected population size and development trends of each major haplotype showed H3 and H4 subgroups were going through a purify evolution and almost disappeared after detection, indicating H3 and H4 might have existed for a long time, while H1 and H2 subgroups were going through a near neutral or neutral evolution and globally increased with time. Notably the frequency of H1 was generally high in Europe and correlated to death rate (r>0.37).
Conclusions
In this study, the evolution and molecular characteristics of more than 16000 genomic sequences provided a new perspective for revealing epidemiology of SARS-CoV-2.
Article activity feed
-
-
SciScore for 10.1101/2020.04.24.058933: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Finally, 624 high quality genomes with precise collection time were selected and aligned using MAFFT v7 with automatic parameters. MAFFTsuggested: (MAFFT, RRID:SCR_011811)Estimate of evolution rate and the time to the most recent common ancestor for SARS-CoV, MERS-CoV, and SARS-CoV-2: The average Ka, Ks and Ka/Ks for all coding sequences were calculated using KaKs_Calculator v1.2(Zhang, et al., 2006), and the substitution rate and tMRCA were estimated using BEAST v2.6.2(Bouckaert, et al., 2019). KaKs_Calculatorsuggested: NoneBEASTsuggested: (BEAST, RRID:SCR_010228)The temporal signal with … SciScore for 10.1101/2020.04.24.058933: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Finally, 624 high quality genomes with precise collection time were selected and aligned using MAFFT v7 with automatic parameters. MAFFTsuggested: (MAFFT, RRID:SCR_011811)Estimate of evolution rate and the time to the most recent common ancestor for SARS-CoV, MERS-CoV, and SARS-CoV-2: The average Ka, Ks and Ka/Ks for all coding sequences were calculated using KaKs_Calculator v1.2(Zhang, et al., 2006), and the substitution rate and tMRCA were estimated using BEAST v2.6.2(Bouckaert, et al., 2019). KaKs_Calculatorsuggested: NoneBEASTsuggested: (BEAST, RRID:SCR_010228)The temporal signal with root-to-tip divergence was visualized in TempEst v1.5.3(Rambaut, et al., 2016) using a ML whole genome tree with bootstrap value as input. TempEstsuggested: (TempEst, RRID:SCR_017304)The output was examined in Tracer v1.6 (http://tree.bio.ed.ac.uk/software/tracer/). Tracersuggested: (Tracer, RRID:SCR_019121)Variants calling of SARS-CoV-2 genome sequences: Each genome sequence was aligned to the reference genome (NC_045512.2) using bowtie2 with default parameters(Langmead and Salzberg, 2012), and variants were called by samtools (sort; mpileup -gf) and bcftoots (call -vm). bowtie2suggested: (Bowtie 2, RRID:SCR_016368)samtoolssuggested: (SAMTOOLS, RRID:SCR_002105)The SMS method was used to select GTR+G as the base substitution model(Lefort, et al., 2017), and the PhyML 3.1(Guindon, et al., 2010) and MEGA(Kumar, et al., 2018) were used to construct the no-root phylogenetic tree by the maximum likelihood method with the bootstrap value of 100. PhyMLsuggested: (PhyML, RRID:SCR_014629)Phylogenetic network of haplotype subgroups: The phylogenetic networks were inferred by PopART package v1.7.2(Leigh, et al., 2015) using TCS and minimum spanning network (MSN) methods respectively. PopARTsuggested: NoneResults from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-