Global genetic diversity patterns and transmissions of SARS-CoV-2
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Background
Since it was firstly discovered in China, the SARS-CoV-2 epidemic has caused a substantial health emergency and economic stress in the world. However, the global genetic diversity and transmissions are still unclear.
Methods
3050 SARS-CoV-2 genome sequences were retrieved from GIASID database. After aligned by MAFFT, the mutation patterns were identified by phylogenetic tree analysis.
Results
We detected 17 high frequency (>6%) mutations in the 3050 sequences. Based on these mutations, we classed the SARS-CoV-2 into four main groups and 10 subgroups. We found that group A was mainly presented in Asia, group B was primarily detected in North America, group C was prevailingly appeared in Asia and Oceania and group D was principally emerged in Europe and Africa. Additionally, the distribution of these groups was different in age, but was similar in gender. Group A, group B1 and group C2 were declined over time, inversely, group B2, group C3 and group D were rising. At last, we found two apparent expansion stages (late Jan-2020 and late Feb-2020 to early Mar-2020, respectively). Notably, most of groups are quickly expanding, especially group D.
Conclusions
We classed the SARS-CoV-2 into four main groups and 10 subgroups based on different mutation patterns at first time. The distribution of the 10 subgroups was different in geography, time and age, but not in gender. Most of groups are rapidly expanding, especially group D. Therefore, we should attach importance to these genetic diversity patterns of SARS-CoV-2 and take more targeted measures to constrain its spread.
Article activity feed
-
SciScore for 10.1101/2020.05.05.20091413: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources The complete genome sequences were aligned with the reference genome of SARS-CoV-2 (NC_045512) by MAFFT (version 7). MAFFTsuggested: (MAFFT, RRID:SCR_011811)The aligned genomes were then edited manually according to the reference sequence by BioEdit software. BioEditsuggested: (BioEdit, RRID:SCR_007361)For ease to visualize the mutations, heatmap was drew by pheatmap package in R language. pheatmapsuggested: (pheatmap, RRID:SCR_016418)Average linkage (UPGMA) method with bootstrap … SciScore for 10.1101/2020.05.05.20091413: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources The complete genome sequences were aligned with the reference genome of SARS-CoV-2 (NC_045512) by MAFFT (version 7). MAFFTsuggested: (MAFFT, RRID:SCR_011811)The aligned genomes were then edited manually according to the reference sequence by BioEdit software. BioEditsuggested: (BioEdit, RRID:SCR_007361)For ease to visualize the mutations, heatmap was drew by pheatmap package in R language. pheatmapsuggested: (pheatmap, RRID:SCR_016418)Average linkage (UPGMA) method with bootstrap value of 100 replicates was used for the construction of phylogenetic tree using MEGA X software. MEGA Xsuggested: NoneResults from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Our study has some limitations due to the nature of the SARS-CoV-2 genome data. First, the sample collection dates may not reflect the actual infection date and not all infected countries uploaded the sequences timely to the GISAID database, so the mutation patterns transmission analysis is only approximate. Second, because some countries have not sequenced enough virus samples (such as Africa) or some countries uploaded samples from single-source (such as the Diamond Princess cruise ship in Japan), the mutation pattern may be biased in specific country or continent. Nevertheless, our study had been included the most complete available SARS-CoV-2 sequences to date. Whether the different mutation patterns of SARS-CoV-2 will result in biological and clinical differences remains to be determined. In conclusion, we classed the SARS-CoV-2 into four main groups and 10 subgroups based on different mutation patterns for the first time. The distribution of the 10 subgroups was varied in geography, time and age, but not in gender. After two apparent expansion stages, most of groups are rapidly expanding, especially group D. Therefore, we should pay attention to these genetic diversity patterns of SARS-CoV-2 and take more targeted measures to control its spread.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
