Genomic Diversity and Hotspot Mutations in 30,983 SARS-CoV-2 Genomes: Moving Toward a Universal Vaccine for the “Confined Virus”?

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The COVID-19 pandemic has been ongoing since its onset in late November 2019 in Wuhan, China. Understanding and monitoring the genetic evolution of the virus, its geographical characteristics, and its stability are particularly important for controlling the spread of the disease and especially for the development of a universal vaccine covering all circulating strains. From this perspective, we analyzed 30,983 complete SARS-CoV-2 genomes from 79 countries located in the six continents and collected from 24 December 2019, to 13 May 2020, according to the GISAID database. Our analysis revealed the presence of 3206 variant sites, with a uniform distribution of mutation types in different geographic areas. Remarkably, a low frequency of recurrent mutations has been observed; only 169 mutations (5.27%) had a prevalence greater than 1% of genomes. Nevertheless, fourteen non-synonymous hotspot mutations (>10%) have been identified at different locations along the viral genome; eight in ORF1ab polyprotein (in nsp2, nsp3, transmembrane domain, RdRp, helicase, exonuclease, and endoribonuclease), three in nucleocapsid protein, and one in each of three proteins: Spike, ORF3a, and ORF8. Moreover, 36 non-synonymous mutations were identified in the receptor-binding domain (RBD) of the spike protein with a low prevalence (<1%) across all genomes, of which only four could potentially enhance the binding of the SARS-CoV-2 spike protein to the human ACE2 receptor. These results along with intra-genomic divergence of SARS-CoV-2 could indicate that unlike the influenza virus or HIV viruses, SARS-CoV-2 has a low mutation rate which makes the development of an effective global vaccine very likely.

Article activity feed

  1. SciScore for 10.1101/2020.06.20.163188: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Variant calling analysis: Genome sequences were mapped to the reference sequence Wuhan-Hu-1/2019 (Genbank ID: NC_045512.2) using Minimap v2.12-r847 [34].
    Minimap
    suggested: None
    The final sorted BAM files were used to call the genetic variants in variant call format (VCF) by SAMtools mpileup and BCFtools [35].
    SAMtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    For that, the SnpEff databases were first built locally using annotations of the reference sequence Wuhan-Hu-1/2019 obtained in the GFF format from NCBI database.
    SnpEff
    suggested: (SnpEff, RRID:SCR_005191)
    NCBI
    suggested: (NCBI, RRID:SCR_006472)
    Comparative analysis of D614 (wild type) and G614 (mutant) interactions with their surrounding residues was done in PyMOL 2.3 (Schrodinger L.L.C).
    PyMOL
    suggested: (PyMOL, RRID:SCR_000305)
    The tree was constructed in IQ-TREE v1.5.5 [41] using the maximum likelihood method under the GTR model.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.05.03.074567: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The BAM files were sorted by SAMtools sort (9), then used to call the genetic variants in variant call format (VCF) by SAMtools mpileup (9) and bcftools v1.8 (9).
    SAMtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    The final call set of the 3067 genomes, was annotated and their impact was predicted using SnpEff v 4.3t (10).
    SnpEff
    suggested: (SnpEff, RRID:SCR_005191)
    First, the SnpEff databases were built locally using annotations of the reference genome NC_045512.2 obtained in GFF format from the NCBI database.
    NCBI
    suggested: (NCBI, RRID:SCR_006472)
    Phylogentic analysis and geodistribution: The downloaded full-length genome sequences of coronaviruses isolated from different hosts from public databases were subjected to multiple sequence alignments using Muscle v 3.8 (
    Muscle
    suggested: (MUSCLE, RRID:SCR_011812)
    Maximum-likelihood phylogenetic trees with 1000 bootstrap replicates were constructed using RaxML v 8.2.12 (39)).
    RaxML
    suggested: (RAxML, RRID:SCR_006086)
    Selective pressure and modelling: We used Hyphy v2.5.8 (13) to estimate synonymous and non-synonymous ratio dN / dS (ω).
    Hyphy
    suggested: (HyPhy, RRID:SCR_016162)
    The selected nucleotide sequences of each dataset were aligned using Clustalw codon-by-codon and the phylogenetic tree was obtained using ML (maximum likelihood) available in MEGA X (14).
    Clustalw
    suggested: (ClustalW, RRID:SCR_017277)
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    Structure visualization and image rendering were performed in PyMOL 2.3 (Schrodinger LLC).
    PyMOL
    suggested: (PyMOL, RRID:SCR_000305)
    The strategy of best reciprocal BLAST results (18) was implemented to identify all of the orthologous genes using Proteinortho v6.0b (19)
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.