Genomic Diversity and Hotspot Mutations in 30,983 SARS-CoV-2 Genomes: Moving Toward a Universal Vaccine for the “Confined Virus”?
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
The COVID-19 pandemic has been ongoing since its onset in late November 2019 in Wuhan, China. Understanding and monitoring the genetic evolution of the virus, its geographical characteristics, and its stability are particularly important for controlling the spread of the disease and especially for the development of a universal vaccine covering all circulating strains. From this perspective, we analyzed 30,983 complete SARS-CoV-2 genomes from 79 countries located in the six continents and collected from 24 December 2019, to 13 May 2020, according to the GISAID database. Our analysis revealed the presence of 3206 variant sites, with a uniform distribution of mutation types in different geographic areas. Remarkably, a low frequency of recurrent mutations has been observed; only 169 mutations (5.27%) had a prevalence greater than 1% of genomes. Nevertheless, fourteen non-synonymous hotspot mutations (>10%) have been identified at different locations along the viral genome; eight in ORF1ab polyprotein (in nsp2, nsp3, transmembrane domain, RdRp, helicase, exonuclease, and endoribonuclease), three in nucleocapsid protein, and one in each of three proteins: Spike, ORF3a, and ORF8. Moreover, 36 non-synonymous mutations were identified in the receptor-binding domain (RBD) of the spike protein with a low prevalence (<1%) across all genomes, of which only four could potentially enhance the binding of the SARS-CoV-2 spike protein to the human ACE2 receptor. These results along with intra-genomic divergence of SARS-CoV-2 could indicate that unlike the influenza virus or HIV viruses, SARS-CoV-2 has a low mutation rate which makes the development of an effective global vaccine very likely.
Article activity feed
-
-
-
SciScore for 10.1101/2020.06.20.163188: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Variant calling analysis: Genome sequences were mapped to the reference sequence Wuhan-Hu-1/2019 (Genbank ID: NC_045512.2) using Minimap v2.12-r847 [34]. Minimapsuggested: NoneThe final sorted BAM files were used to call the genetic variants in variant call format (VCF) by SAMtools mpileup and BCFtools [35]. SAMtoolssuggested: (SAMTOOLS, RRID:SCR_002105)For that, the SnpEff databases were first built locally using annotations of the reference sequence Wuhan-Hu-1/2019 obtained in the GFF format from NCBI database. SnpEffsuggested: (SnpEff, RRID:SCR_005191)NCBIsuggested: (NCBI, RRID:SCR_006472)SciScore for 10.1101/2020.06.20.163188: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Variant calling analysis: Genome sequences were mapped to the reference sequence Wuhan-Hu-1/2019 (Genbank ID: NC_045512.2) using Minimap v2.12-r847 [34]. Minimapsuggested: NoneThe final sorted BAM files were used to call the genetic variants in variant call format (VCF) by SAMtools mpileup and BCFtools [35]. SAMtoolssuggested: (SAMTOOLS, RRID:SCR_002105)For that, the SnpEff databases were first built locally using annotations of the reference sequence Wuhan-Hu-1/2019 obtained in the GFF format from NCBI database. SnpEffsuggested: (SnpEff, RRID:SCR_005191)NCBIsuggested: (NCBI, RRID:SCR_006472)Comparative analysis of D614 (wild type) and G614 (mutant) interactions with their surrounding residues was done in PyMOL 2.3 (Schrodinger L.L.C). PyMOLsuggested: (PyMOL, RRID:SCR_000305)The tree was constructed in IQ-TREE v1.5.5 [41] using the maximum likelihood method under the GTR model. IQ-TREEsuggested: (IQ-TREE, RRID:SCR_017254)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-
SciScore for 10.1101/2020.05.03.074567: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The BAM files were sorted by SAMtools sort (9), then used to call the genetic variants in variant call format (VCF) by SAMtools mpileup (9) and bcftools v1.8 (9). SAMtoolssuggested: (SAMTOOLS, RRID:SCR_002105)The final call set of the 3067 genomes, was annotated and their impact was predicted using SnpEff v 4.3t (10). SnpEffsuggested: (SnpEff, RRID:SCR_005191)First, the SnpEff databases were built locally using annotations of the reference genome NC_045512.2 obtained in GFF format from the NCBI database. NCBIsuggested: (NCBI, RRID:SCR_006472)Phylogentic analysis and geodistribution: The … SciScore for 10.1101/2020.05.03.074567: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The BAM files were sorted by SAMtools sort (9), then used to call the genetic variants in variant call format (VCF) by SAMtools mpileup (9) and bcftools v1.8 (9). SAMtoolssuggested: (SAMTOOLS, RRID:SCR_002105)The final call set of the 3067 genomes, was annotated and their impact was predicted using SnpEff v 4.3t (10). SnpEffsuggested: (SnpEff, RRID:SCR_005191)First, the SnpEff databases were built locally using annotations of the reference genome NC_045512.2 obtained in GFF format from the NCBI database. NCBIsuggested: (NCBI, RRID:SCR_006472)Phylogentic analysis and geodistribution: The downloaded full-length genome sequences of coronaviruses isolated from different hosts from public databases were subjected to multiple sequence alignments using Muscle v 3.8 ( Musclesuggested: (MUSCLE, RRID:SCR_011812)Maximum-likelihood phylogenetic trees with 1000 bootstrap replicates were constructed using RaxML v 8.2.12 (39)). RaxMLsuggested: (RAxML, RRID:SCR_006086)Selective pressure and modelling: We used Hyphy v2.5.8 (13) to estimate synonymous and non-synonymous ratio dN / dS (ω). Hyphysuggested: (HyPhy, RRID:SCR_016162)The selected nucleotide sequences of each dataset were aligned using Clustalw codon-by-codon and the phylogenetic tree was obtained using ML (maximum likelihood) available in MEGA X (14). Clustalwsuggested: (ClustalW, RRID:SCR_017277)MEGAsuggested: (Mega BLAST, RRID:SCR_011920)Structure visualization and image rendering were performed in PyMOL 2.3 (Schrodinger LLC). PyMOLsuggested: (PyMOL, RRID:SCR_000305)The strategy of best reciprocal BLAST results (18) was implemented to identify all of the orthologous genes using Proteinortho v6.0b (19) BLASTsuggested: (BLASTX, RRID:SCR_001653)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-