Protein covariance networks reveal interactions important to the emergence of SARS coronaviruses as human pathogens

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21 st century and that have likely emerged from animal reservoirs based on genomic similarities to bat and other animal viruses. Here we report the analysis of conserved interactions between amino acid residues in proteins encoded by SARS-CoV-related viruses. We identified pairs and networks of residue variants that exhibited statistically high frequencies of covariance with each other. While these interactions are likely key to both protein structure and other protein-protein interactions, we have also found that they can be used to provide a new computational approach (CoVariance-based Phylogeny Analysis) for understanding viral evolution and adaptation. Our data provide evidence that the evolutionary processes that converted a bat virus into human pathogen occurred through recombination with other viruses in combination with new adaptive mutations important for entry into human cells.

Article activity feed

  1. SciScore for 10.1101/2020.06.05.136887: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Protein sequences for genes in 1a,1b, S, 3a, E, M, and N were concatenated and aligned using CLC Bio Workbench (v8) for the 850 genomes using a gap open cost of 2.0 and a gap extension cost of 1.0 in very accurate mode and with MAFFT for the clinical SARS-CoV-2 strain using the default FFT-NS-2 setting (CLC-Bio Workbench 8, Nakamura et al., 2018). 13,611 clinical strains available on May 10, 2020 with significant contiguous coverage over the reference genes (>95%) were kept in the alignment.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    This allowed binning of clusters and respective strains for Force Mapping in Gephi using the Multigravity ForceAtlas 2 setting and comparison of covariant residues based on clusters and strains (Jacomy et al., 2014).
    Gephi
    suggested: (Gephi, RRID:SCR_004293)
    Residues in S protein were mapped onto the PDB structure for the S trimer (6VXX.pdb and 6ACC.pdb) using PyMol (v.2.3.4) (PyMol, Song et al., 2018; Walls et al., 2020).
    PyMol
    suggested: (PyMOL, RRID:SCR_000305)
    Arpeggio was used to calculate interacting residues in the PDB file (Jubb et al., 2017).
    Arpeggio
    suggested: (Arpeggio, RRID:SCR_010876)
    Circular graphing of key collections of residues was done using Circos (Krzywinski et al., 2009).
    Circos
    suggested: (Circos, RRID:SCR_011798)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    While nucleic acid sequence-based phylogenies are informative they clearly have limitations. Thus, we focused on variations in protein sequence to understand CoV evolution and the key functional interactions that drive adaption to new hosts or that influence transmission and pathogenicity. We selected the conserved CoV proteins called 1a/1b, Spike (S protein), 3a, E, M, and N from a set of 850 viral genomes (Supplemental table S1). The alignment resulted in a 9639 amino acid consensus sequence with only 2% of sites being gaps with low coverage and 2% with low sequence conservation (Supplemental File F1). Because there are regions of diverged nucleotide identity in these viral genomes, our goal was to use amino acid identity to initially estimate phylogeny and then integrate that analysis with the identification of covariant residues within these six core viral proteins. Both SARS-CoV and SARS-CoV-2 are represented by a large collection of independent isolates from previous and current epidemics as well as from variants selected during passage in various laboratories (Supplemental Table S1) (Elbe and Buckland-Merrett, 2017). Similar to nucleic acid sequence-based analyses, our constructed phylogeny (Supplemental Figure S1) shows that SARS-CoV is closely related to CoVs found in civets and groups of bat CoVs that were previously suggested as likely ancestors (Li, 2005; Song et al., 2005). In contrast, SARS-CoV-2 is related to bat CoV RATG13 and also more similar to two bat CoVs...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.