Molecular Evolution of SARS-CoV-2 Structural Genes: Evidence of Positive Selection in Spike Glycoprotein

This article has been Reviewed by the following groups

Read the full article

Abstract

SARS-CoV-2 caused a global pandemic in early 2020 and has resulted in more than 8,000,000 infections as well as 430,000 deaths in the world so far. Four structural proteins, envelope (E), membrane (M), nucleocapsid (N) and spike (S) glycoprotein, play a key role in controlling the entry into human cells and virion assembly of SARS-CoV-2. However, how these genes evolve during its human to human transmission is largely unknown. In this study, we screened and analyzed roughly 3090 SARS-CoV-2 isolates from GenBank database. The distribution of the four gene alleles is determined:16 for E, 40 for M, 131 for N and 173 for S genes. Phylogenetic analysis shows that global SARS-CoV-2 isolates can be clustered into three to four major clades based on the protein sequences of these genes. Intragenic recombination event isn’t detected among different alleles. However, purifying selection has conducted on the evolution of these genes. By analyzing full genomic sequences of these alleles using codon-substitution models (M8, M3 and M2a) and likelihood ratio tests (LRTs) of codeML package, it reveals that codon 614 of S glycoprotein has subjected to strong positive selection pressure and a persistent D614G mutation is identified. The definitive positive selection of D614G mutation is further confirmed by internal fixed effects likelihood (IFEL) and Evolutionary Fingerprinting methods implemented in Hyphy package. In addition, another potential positive selection site at codon 5 in the signal sequence of the S protein is also identified. The allele containing D614G mutation has undergone significant expansion during SARS-CoV-2 global pandemic, implying a better adaptability of isolates with the mutation. However, L5F allele expansion is relatively restricted. The D614G mutation is located at the subdomain 2 (SD2) of C-terminal portion (CTP) of the S1 subunit. Protein structural modeling shows that the D614G mutation may cause the disruption of salt bridge among S protein monomers increase their flexibility, and in turn promote receptor binding domain (RBD) opening, virus attachment and entry into host cells. Located at the signal sequence of S protein as it is, L5F mutation may facilitate the protein folding, assembly, and secretion of the virus. This is the first evidence of positive Darwinian selection in the spike gene of SARS-CoV-2, which contributes to a better understanding of the adaptive mechanism of this virus and help to provide insights for developing novel therapeutic approaches as well as effective vaccines by targeting on mutation sites.

Article activity feed

  1. SciScore for 10.1101/2020.06.25.170688: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Model selection was conducted in MEGA X.
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    The tree is drawn to scale, and FigTree V1.4 was utilized to form cladogram branches (http://tree.bio.ed.ac.uk/software/figtree/).
    FigTree
    suggested: (FigTree, RRID:SCR_008515)
    These analyses were carried out using DnaSP 6.12.03[11].
    DnaSP
    suggested: (DnaSP, RRID:SCR_003067)
    HyPhy package was used to validate the result obtained by ML method[22].
    HyPhy
    suggested: (HyPhy, RRID:SCR_016162)
    Model quality was evaluated by QMEAN while the structure of the model was visualized by using PyMoL [23].
    PyMoL
    suggested: (PyMOL, RRID:SCR_000305)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 42. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.