Characterizations of SARS-CoV-2 mutational profile, spike protein stability and viral transmission

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The recent pandemic of SARS-CoV-2 infection has affected more than 3.0 million people worldwide with more than 200 thousand reported deaths. The SARS-CoV-2 genome has a capability of gaining rapid mutations as the virus spreads. Whole genome sequencing data offers a wide range of opportunities to study the mutation dynamics. The advantage of increasing amount of whole genome sequence data of SARS-CoV-2 intrigued us to explore the mutation profile across the genome, to check the genome diversity and to investigate the implications of those mutations in protein stability and viral transmission. Four proteins, surface glycoprotein, nucleocapsid, ORF1ab and ORF8 showed frequent mutations, while envelop, membrane, ORF6 and ORF7a proteins showed conservation in terms of amino acid substitutions. Some of the mutations across different proteins showed co-occurrence, suggesting their functional cooperation in stability, transmission and adaptability. Combined analysis with the frequently mutated residues identified 20 viral variants, among which 12 specific combinations comprised more than 97% of the isolates considered for the analysis. Analysis of protein structure stability of surface glycoprotein mutants indicated viability of specific variants and are more prone to be temporally and spatially distributed across the globe. Similar empirical analysis of other proteins indicated existence of important functional implications of several variants. Analysis of co-occurred mutants indicated their structural and/or functional interaction among different SARS-COV-2 proteins. Identification of frequently mutated variants among COVID-19 patients might be useful for better clinical management, contact tracing and containment of the disease.

Article activity feed

  1. SciScore for 10.1101/2020.05.03.066266: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Identification of variable sites: We have aligned nucleotide and amino acid sequences of ORF1ab, ORF3a, ORF6, ORF7a, ORF8, ORF10, envelop (E), membrane (M), nucleocapsid (N) and surface glycoprotein (S) using MUSCLE multiple sequence alignment algorithm in MEGA-X [17].
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    , Pangolin coronavirus (MT072864.1) and two SARS-CoV strains TW11 (AY502924.1) and GD01 (AY278489.2) were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/protein).
    NCBI
    suggested: (NCBI, RRID:SCR_006472)
    https://www.ncbi.nlm.nih.gov/protein
    suggested: (Protein Database, RRID:SCR_017486)
    We used FoldX BuildModel function to construct the mutant 3D protein structures [23].
    FoldX
    suggested: (FoldX, RRID:SCR_008522)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.