Genomic, geographic and temporal distributions of SARS-CoV-2 mutations
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
The COVID-19 pandemic is the most significant public health issue in recent history. Its causal agent, SARS-CoV-2, has evolved rapidly since its first emergence in December 2019. Mutations in the viral genome have critical impacts on the adaptation of viral strains to the local environment, and may alter the characteristics of viral transmission, disease manifestation, and the efficacy of treatment and vaccination. Using the complete sequences of 1,932 SARS-CoV-2 genomes, we examined the genomic, geographic and temporal distributions of aged, new, and frequent mutations of SARS-CoV-2, and identified six phylogenetic clusters of the strains, which also exhibit a geographic preference in different continents. Mutations in the form of single nucleotide variations (SNVs) provide a direct interpretation for the six phylogenetic clusters. Linkage disequilibrium, haplotype structure, evolutionary process, global distribution of mutations unveiled a sketch of the mutational history. Additionally, we found a positive correlation between the average mutation count and case fatality, and this correlation had strengthened with time, suggesting an important role of SNVs on disease outcomes. This study suggests that SNVs may become an important consideration in virus detection, clinical treatment, drug design, and vaccine development to avoid target shifting, and that continued isolation and sequencing is a crucial component in the fight against this pandemic.
Significance Statement
Mutation is the driving force of evolution for viruses like SARS-CoV-2, the causal agent of COVID-19. In this study, we discovered that the genome of SARS-CoV-2 is changing rapidly from the originally isolated form. These mutations have been spreading around the world and caused more than 2.5 million of infected cases and 170 thousands of deaths. We found that fourteen frequent mutations identified in this study can characterize the six main clusters of SARS-CoV-2 strains. In addition, we found the mutation burden is positively correlated with the fatality of COVID-19 patients. Understanding mutations in the SARS-CoV-2 genome will provide useful insight for the design of treatment and vaccination.
Article activity feed
-
SciScore for 10.1101/2020.04.22.055863: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Multiple sequence alignment was performed by using MUSCLE (4). MUSCLEsuggested: (MUSCLE, RRID:SCR_011812)Mutation frequencies in gene regions were illustrated and coefficient of LD (R2) between pairs of nucleotides was calculated by using PLINK (6). PLINKsuggested: (PLINK, RRID:SCR_001757)Phylogenetic tree analysis was performed by using MEGA X (7). MEGAsuggested: (Mega BLAST, RRID:SCR_011920)Results from OddPub: We did not detect open data. We also did not detect open code. …
SciScore for 10.1101/2020.04.22.055863: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Multiple sequence alignment was performed by using MUSCLE (4). MUSCLEsuggested: (MUSCLE, RRID:SCR_011812)Mutation frequencies in gene regions were illustrated and coefficient of LD (R2) between pairs of nucleotides was calculated by using PLINK (6). PLINKsuggested: (PLINK, RRID:SCR_001757)Phylogenetic tree analysis was performed by using MEGA X (7). MEGAsuggested: (Mega BLAST, RRID:SCR_011920)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
-
SciScore for 10.1101/2020.04.22.055863: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Multiple sequence alignment was performed by using MUSCLE (4). MUSCLEsuggested: (MUSCLE, SCR_011812)Mutation frequencies in gene regions were illustrated and coefficient of LD (R2) between pairs of nucleotides was calculated by using PLINK (6). PLINKsuggested: (PLINK, SCR_001757)analysis was performed by using MEGA X (7). MEGAsuggested: (Mega BLAST, SCR_011920)Results from OddPub: We did not find a statement …
SciScore for 10.1101/2020.04.22.055863: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Multiple sequence alignment was performed by using MUSCLE (4). MUSCLEsuggested: (MUSCLE, SCR_011812)Mutation frequencies in gene regions were illustrated and coefficient of LD (R2) between pairs of nucleotides was calculated by using PLINK (6). PLINKsuggested: (PLINK, SCR_001757)analysis was performed by using MEGA X (7). MEGAsuggested: (Mega BLAST, SCR_011920)Results from OddPub: We did not find a statement about open data. We also did not find a statement about open code. Researchers are encouraged to share open data when possible (see Nature blog).
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.
-
