SARS-CoV-2 amino acid substitutions widely spread in the human population are mainly located in highly conserved segments of the structural proteins

This article has been Reviewed by the following groups

Read the full article

Abstract

The Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic offers a unique opportunity to study the introduction and evolution of a pathogen into a completely naïve human population. We identified and analysed the amino acid mutations that gained prominence worldwide in the early months of the pandemic. Eight mutations have been identified along the viral genome, mostly located in conserved segments of the structural proteins and showing low variability among coronavirus, which indicated that they might have a functional impact. At the moment of writing this paper, these mutations present a varied success in the SARS-CoV-2 virus population; ranging from a change in the spike protein that becomes absolutely prevalent, two mutations in the nucleocapsid protein showing frequencies around 25%, to a mutation in the matrix protein that nearly fades out after reaching a frequency of 20%.

Article activity feed

  1. SciScore for 10.1101/2020.05.16.099499: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Analysis of non-synonymous mutations and selection of mutations to be studied: Complete genomes were aligned using the multiple alignment program ClustalW (Thompson et al. 1994) and consequently split by week according to their isolation date with the sequence alignment editor Bioedit (Hall 1999).
    ClustalW
    suggested: (ClustalW, RRID:SCR_017277)
    3D structures were rendered using PyMOL (The PyMOL Molecular Graphics System, Version 2.3.4.
    PyMOL
    suggested: (PyMOL, RRID:SCR_000305)
    The set of proteomic utilities in EXPASY (https://www.expasy.org/proteomics) was used to check for different aspects on the mutant proteins (motifs, phosphorylation sites, etc.).
    EXPASY
    suggested: None
    PROVEAN 1.1. (http://provean.jcvi.org/, Choi et al. 2012) was used to gain insight on whether the mutation could be deleterious or neutral.
    PROVEAN
    suggested: (PROVEAN, RRID:SCR_002182)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Certainly, this approach has the limitation of neglecting some mutations with lesser prevalence that still can be biologically significant. Time will show it. It is worth noting that 7 out of 8 of the widely spread mutations occurred in residues that were highly conserved in related coronaviruses of bats, pangolins, civets, or in SARS-CoV (Fig. 2). Conserved regions are usually assumed to be functionally relevant and thus, mutations in them may have deleterious effects or can be hardly tolerated; if so, they will be probably removed in the future. A mutation in a highly conserved region that becomes widespread and persists can be thought as representative of a change that increases viral fitness. In the present case, we found three different situations: mutations that expanded and rise to predominance, mutations that expanded to a certain extent and fade out, and mutations that are apparently expanding but not yet predominant. This pattern affecting conserved regions was also seen for SARS-CoV although the affected proteins were different. Interestingly, in SARS-CoV-2 most of the mutations were in structural proteins, while in SARS-CoV were in non-structural ones, suggesting that the adaption process from the original host species to human was different in these two cases. When the spike protein was examined, this difference was more obvious. Mutations in SARS-CoV occurred in positions conserved in the civet and bat-related coronaviruses but different from those of pangolin a...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.