SARS-CoV2 (COVID-19) Structural/Evolution Dynamicome: Insights into functional evolution and human genomics

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The SARS-CoV-2 pandemic, starting in 2019, has challenged the speed at which labs perform science, ranging from discoveries of the viral composition to handling health outcomes in humans. The small ~30kb single-stranded RNA genome of Coronaviruses makes them adept at cross species spread and drift, increasing their probability to cause pandemics. However, this small genome also allows for a robust understanding of all proteins coded by the virus. We employed protein modeling, molecular dynamic simulations, evolutionary mapping, and 3D printing to gain a full proteome and dynamicome understanding of SARS-CoV-2. The Viral Integrated Structural Evolution Dynamic Database (VIStEDD) has been established (prokoplab.com/vistedd), opening future discoveries and educational usage. In this paper, we highlight VIStEDD usage for nsp6, Nucleocapsid (N), and Spike (S) surface glycoprotein. For both nsp6 and N we reveal highly conserved surface amino acids that likely drive protein-protein interactions. In characterizing viral S protein, we have developed a quantitative dynamics cross correlation matrix insight into interaction with the ACE2/SLC6A19 dimer complex. From this quantitative matrix, we elucidated 47 potential functional missense variants from population genomic databases within ACE2/SLC6A19/TMPRSS2, warranting genomic enrichment analyses in SARS-CoV-2 patients. Moreover, these variants have ultralow frequency, but can exist as hemizygous in males for ACE2, which falls on the X-chromosome. Two noncoding variants (rs4646118 and rs143185769) found in ~9% of African descent individuals for ACE2 may regulate expression and be related to increased susceptibility of African Americans to SARS-CoV-2. This powerful database of SARS-CoV-2 can aid in research progress in the ongoing pandemic.

Article activity feed

  1. SciScore for 10.1101/2020.05.15.098616: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    For those proteins without structural homologs we utilized models that are part of the I-TASSER SARS-CoV-2 database (Zhang et al., 2020).
    I-TASSER
    suggested: (I-TASSER, RRID:SCR_014627)
    Each of these models were then fed through homology modeling in YASARA to normalize energetic predictions to the homology models.
    YASARA
    suggested: (YASARA, RRID:SCR_017591)
    Sequences (within the genomics folder of each protein) were extracted using the sequences listed in table 1 with BLASTp against the non-redundant protein sequences (nr) and aligned using ClustalW (Larkin et al., 2007).
    BLASTp
    suggested: (BLASTP, RRID:SCR_001010)
    Within PyMOL the structure was also exported as a vrml file for 3D printing.
    PyMOL
    suggested: (PyMOL, RRID:SCR_000305)
    Vertebrate sequences of the three proteins were extracted using NCBI orthologs for the transcript, open reading frames assessed using Transdecoder (Haas et al., 2013), and aligned using ClustalW codons in MEGA.
    ClustalW
    suggested: (ClustalW, RRID:SCR_017277)
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    Conservation was performed on the data as previously published (Prokop et al., 2018).
    Conservation
    suggested: (Conservation, RRID:SCR_016064)
    Genomic missense variants were extracted from gnomADv2 for each of the three genes followed by assessment using PolyPhen2 (Adzhubei et al., 2010)
    PolyPhen2
    suggested: None
    ), SIFT (Ng and Henikoff, 2003), and Align-GVGD (Tavtigian et al., 2006).
    SIFT
    suggested: (SIFT, RRID:SCR_012813)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 21. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.