Positive selection within the genomes of SARS-CoV-2 and other Coronaviruses independent of impact on protein function

This article has been Reviewed by the following groups

Read the full article

Abstract

The emergence of a novel coronavirus (SARS-CoV-2) associated with severe acute respiratory disease (COVID-19) has prompted efforts to understand the genetic basis for its unique characteristics and its jump from non-primate hosts to humans. Tests for positive selection can identify apparently nonrandom patterns of mutation accumulation within genomes, highlighting regions where molecular function may have changed during the origin of a species. Several recent studies of the SARS-CoV-2 genome have identified signals of conservation and positive selection within the gene encoding Spike protein based on the ratio of synonymous to nonsynonymous substitution. Such tests cannot, however, detect changes in the function of RNA molecules.

Methods

Here we apply a test for branch-specific oversubstitution of mutations within narrow windows of the genome without reference to the genetic code.

Results

We recapitulate the finding that the gene encoding Spike protein has been a target of both purifying and positive selection. In addition, we find other likely targets of positive selection within the genome of SARS-CoV-2, specifically within the genes encoding Nsp4 and Nsp16. Homology-directed modeling indicates no change in either Nsp4 or Nsp16 protein structure relative to the most recent common ancestor. These SARS-CoV-2-specific mutations may affect molecular processes mediated by the positive or negative RNA molecules, including transcription, translation, RNA stability, and evasion of the host innate immune system. Our results highlight the importance of considering mutations in viral genomes not only from the perspective of their impact on protein structure, but also how they may impact other molecular processes critical to the viral life cycle.

Article activity feed

  1. SciScore for 10.1101/2020.09.16.300038: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    This computational methodology makes use of a likelihood ratio test based on the maximum likelihood estimates obtained from HyPhy v2.5 (Pond, Frost & Muse, 2005; Pond et al., 2020).
    HyPhy
    suggested: (HyPhy, RRID:SCR_016162)
    To plot these we took the average from each alignment and plot it using the library Gviz and Bioconductor (Hahne & Ivanek, 2016) in R. Testing for Recombination: Inference of branch specific selection can be confounded by recombination given that a single phylogenetic tree may not explain the evolution of viruses.
    Bioconductor
    suggested: (Bioconductor, RRID:SCR_006442)
    Here, we screened for evidence of recombination in two ways, one, by estimating phylogenetic trees in sliding windows of 500 bp and a step of 150 along coronavirus alignment using RaXML-NG v0.9 (Kozlov et al., 2019).
    RaXML-NG
    suggested: None
    To align these sequences, we used MAFFT (Katoh & Standley, 2013)
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Pairwise comparisons of predicted protein structures were visualized using PyMOL software (DeLano, 2002).
    PyMOL
    suggested: (PyMOL, RRID:SCR_000305)
    Alignment and structural comparisons performed by FATCAT (Ye & Godzik, 2004).
    FATCAT
    suggested: (FATCAT, RRID:SCR_014631)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    For coronaviruses this is a notable limitation, given that many aspects of the lifecycle involve RNA function (Madhugiri et al., 2016; Ziv et al., 2020; Alhatlani, 2020). In addition, the secondary structure of some segments within the RNA genome is well conserved among coronavirus species, which implies a functional role (Rangan et al., 2020; Sanders et al., 2020; Huston et al., 2020a). Indeed, the SARS-CoV-2 genome is reported to contain more well-structured regions than any other known virus, including both coding and noncoding regions of the genome (Huston et al., 2020a). We therefore examined nucleotide substitutions within regions of putative positive selection in Nsp4 and Nsp16 for their likely impact on both protein and RNA structure (Fig 4 and 5). In the case of Nsp4 protein, two nearly adjacent nonsynonymous substitutions at residues 380 and 382 occurred on the branch leading to SARS-CoV-2 (Fig 3B). These both involve changing side chains with similar biochemical properties, respectively valine to alanine and valine to isoleucine. Homology-directed modeling of protein structure suggests that these two amino acid substitutions have very little impact on either secondary or tertiary structure when comparing the SARS-CoV-2 protein orthologue to those of the other species examined (Fig 4A). In the case of Nsp16 protein, no nonsynonymous substitutions evolved on the branch leading to SARS-CoV-2. Thus, the signal of positive selection within Nsp4 is unlikely to reflect ch...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on pages 32, 30 and 31. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.