Novel Immunoglobulin Domain Proteins Provide Insights into Evolution and Pathogenesis of SARS-CoV-2-Related Viruses
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
The ongoing COVID-19 pandemic strongly emphasizes the need for a more complete understanding of the biology and pathogenesis of its causative agent SARS-CoV-2. Despite intense scrutiny, several proteins encoded by the genomes of SARS-CoV-2 and other SARS-like coronaviruses remain enigmatic. Moreover, the high infectivity and severity of SARS-CoV-2 in certain individuals make wet-lab studies currently challenging. In this study, we used a series of computational strategies to identify several fast-evolving regions of SARS-CoV-2 proteins which are potentially under host immune pressure. Most notably, the hitherto-uncharacterized protein encoded by ORF8 is one of them. Using sensitive sequence and structural analysis methods, we show that ORF8 and several other proteins from alpha- and beta-coronavirus comprise novel families of immunoglobulin domain proteins, which might function as potential immune modulators to delay or attenuate the host immune response against the viruses.
Article activity feed
-
-
SciScore for 10.1101/2020.03.04.977736: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The program CD-HIT was used for similarity-based clustering (13). CD-HITsuggested: (CD-HIT, RRID:SCR_007105)Based on the MSA, a similarity plot was constructed by a custom Python script, which calculated the identity between each subject sequence and the SARS-CoV-2 genome sequence based on a custom sliding window size and step size. Pythonsuggested: (IPython, RRID:SCR_001658)Similarity-based clustering was conducted by BLASTCLUST, a BLAST score-based single-linkage clustering method (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html). BLASTCLUSTsuggested: (BLASTClust, RRID:SCR_016641)Mul… SciScore for 10.1101/2020.03.04.977736: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The program CD-HIT was used for similarity-based clustering (13). CD-HITsuggested: (CD-HIT, RRID:SCR_007105)Based on the MSA, a similarity plot was constructed by a custom Python script, which calculated the identity between each subject sequence and the SARS-CoV-2 genome sequence based on a custom sliding window size and step size. Pythonsuggested: (IPython, RRID:SCR_001658)Similarity-based clustering was conducted by BLASTCLUST, a BLAST score-based single-linkage clustering method (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html). BLASTCLUSTsuggested: (BLASTClust, RRID:SCR_016641)Multiple sequence alignments were built by the KALIGN (14), MUSCLE(16) and PROMALS3D(17) programs, followed by careful manual adjustments based on the profile–profile alignment, the secondary structure information and the structural alignment. KALIGNsuggested: (Kalign, RRID:SCR_011810)The alignments were colored using an in-house alignment visualization program written in perl and further modified using adobe illustrator. adobe illustratorsuggested: (Adobe Illustrator, RRID:SCR_010279)Identification of distinct viral Ig domain proteins: By using the protein remote relationship detection methods, we generated a collection of distinct Ig domains from the Pfam database (21) and also from our local domain database. Pfamsuggested: (Pfam, RRID:SCR_004726)Then, we utilized the hmmscan program of the HMMER package (22) and RPS-BLAST (12, 23) to retrieve the homologs from viral genomes. HMMERsuggested: (Hmmer, RRID:SCR_005305)The tree diagram was generated using MEGA Tree Explorer (26) Entropy analysis: Position-wise Shannon entropy (H) for a given multiple sequence alignment was calculated using the equation: P is the fraction of residues of amino acid type i, and M is the number of amino acid types. MEGAsuggested: (Mega BLAST, RRID:SCR_011920)Since in these low sequence-identity cases, sequence alignment is the most important factor affecting the quality of the model (Cozzetto and Tramontano, 2005), alignments used in this study have been carefully built and cross-validated based on the information from HHpred and edited manually using the secondary structure information. HHpredsuggested: (HHpred, RRID:SCR_010276)Structural analysis and comparison were conducted using the molecular visualization program PyMOL (30). PyMOLsuggested: (PyMOL, RRID:SCR_000305)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-