Novel Immunoglobulin Domain Proteins Provide Insights into Evolution and Pathogenesis of SARS-CoV-2-Related Viruses

This article has been Reviewed by the following groups

Read the full article

Abstract

The ongoing COVID-19 pandemic strongly emphasizes the need for a more complete understanding of the biology and pathogenesis of its causative agent SARS-CoV-2. Despite intense scrutiny, several proteins encoded by the genomes of SARS-CoV-2 and other SARS-like coronaviruses remain enigmatic. Moreover, the high infectivity and severity of SARS-CoV-2 in certain individuals make wet-lab studies currently challenging. In this study, we used a series of computational strategies to identify several fast-evolving regions of SARS-CoV-2 proteins which are potentially under host immune pressure. Most notably, the hitherto-uncharacterized protein encoded by ORF8 is one of them. Using sensitive sequence and structural analysis methods, we show that ORF8 and several other proteins from alpha- and beta-coronavirus comprise novel families of immunoglobulin domain proteins, which might function as potential immune modulators to delay or attenuate the host immune response against the viruses.

Article activity feed

  1. SciScore for 10.1101/2020.03.04.977736: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The program CD-HIT was used for similarity-based clustering (13).
    CD-HIT
    suggested: (CD-HIT, RRID:SCR_007105)
    Based on the MSA, a similarity plot was constructed by a custom Python script, which calculated the identity between each subject sequence and the SARS-CoV-2 genome sequence based on a custom sliding window size and step size.
    Python
    suggested: (IPython, RRID:SCR_001658)
    Similarity-based clustering was conducted by BLASTCLUST, a BLAST score-based single-linkage clustering method (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html).
    BLASTCLUST
    suggested: (BLASTClust, RRID:SCR_016641)
    Multiple sequence alignments were built by the KALIGN (14), MUSCLE(16) and PROMALS3D(17) programs, followed by careful manual adjustments based on the profile–profile alignment, the secondary structure information and the structural alignment.
    KALIGN
    suggested: (Kalign, RRID:SCR_011810)
    The alignments were colored using an in-house alignment visualization program written in perl and further modified using adobe illustrator.
    adobe illustrator
    suggested: (Adobe Illustrator, RRID:SCR_010279)
    Identification of distinct viral Ig domain proteins: By using the protein remote relationship detection methods, we generated a collection of distinct Ig domains from the Pfam database (21) and also from our local domain database.
    Pfam
    suggested: (Pfam, RRID:SCR_004726)
    Then, we utilized the hmmscan program of the HMMER package (22) and RPS-BLAST (12, 23) to retrieve the homologs from viral genomes.
    HMMER
    suggested: (Hmmer, RRID:SCR_005305)
    The tree diagram was generated using MEGA Tree Explorer (26) Entropy analysis: Position-wise Shannon entropy (H) for a given multiple sequence alignment was calculated using the equation: P is the fraction of residues of amino acid type i, and M is the number of amino acid types.
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    Since in these low sequence-identity cases, sequence alignment is the most important factor affecting the quality of the model (Cozzetto and Tramontano, 2005), alignments used in this study have been carefully built and cross-validated based on the information from HHpred and edited manually using the secondary structure information.
    HHpred
    suggested: (HHpred, RRID:SCR_010276)
    Structural analysis and comparison were conducted using the molecular visualization program PyMOL (30).
    PyMOL
    suggested: (PyMOL, RRID:SCR_000305)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.