Molecular Mimicry Map (3M) of SARS-CoV-2: Prediction of potentially immunopathogenic SARS-CoV-2 epitopes via a novel immunoinformatic approach

Abstract

Currently, more than 33 million peoples have been infected by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and more than a million people died from coronavirus disease 2019 (COVID-19), a disease caused by the virus. There have been multiple reports of autoimmune and inflammatory diseases following SARS-CoV-2 infections. There are several suggested mechanisms involved in the development of autoimmune diseases, including cross-reactivity (molecular mimicry). A typical workflow for discovering cross-reactive epitopes (mimotopes) starts with a sequence similarity search between protein sequences of human and a pathogen. However, sequence similarity information alone is not enough to predict cross-reactivity between proteins since proteins can share highly similar conformational epitopes whose amino acid residues are situated far apart in the linear protein sequences. Therefore, we used a hidden Markov model-based tool to identify distant viral homologs of human proteins. Also, we utilized experimentally determined and modeled protein structures of SARS-CoV-2 and human proteins to find homologous protein structures between them. Next, we predicted binding affinity (IC50) of potentially cross-reactive T-cell epitopes to 34 MHC allelic variants that have been associated with autoimmune diseases using multiple prediction algorithms. Overall, from 8,138 SARS-CoV-2 genomes, we identified 3,238 potentially cross-reactive B-cell epitopes covering six human proteins and 1,224 potentially cross-reactive T-cell epitopes covering 285 human proteins. To visualize the predicted cross-reactive T-cell and B-cell epitopes, we developed a web-based application “Molecular Mimicry Map (3M) of SARS-CoV-2” (available at https://ahs2202.github.io/3M/ ). The web application enables researchers to explore potential cross-reactive SARS-CoV-2 epitopes alongside custom peptide vaccines, allowing researchers to identify potentially suboptimal peptide vaccine candidates or less ideal part of a whole virus vaccine to design a safer vaccine for people with genetic and environmental predispositions to autoimmune diseases. Together, the computational resources and the interactive web application provide a foundation for the investigation of molecular mimicry in the pathogenesis of autoimmune disease following COVID-19.

SciScore for 10.1101/2020.11.12.344424: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
A BLAST database of the 20,595 human proteins was made using the makeblastdb module, and the 93,413 SARS-CoV-2 proteins were searched against the database to find potentially homologous regions between SARS-CoV-2 proteins and human proteins (BLAST version 2.10.0+, released 16 December 2019).	BLAST suggested: (BLASTX, RRID:SCR_001653)
A database of profile hidden Markov models (profile HMM) for every human protein sequence was built by iteratively searching a profile HMM for each human protein against all 2,949,581 protein sequences of viruses infecting human, downloaded from NCBI Virus …

SciScore for 10.1101/2020.11.12.344424: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
A BLAST database of the 20,595 human proteins was made using the makeblastdb module, and the 93,413 SARS-CoV-2 proteins were searched against the database to find potentially homologous regions between SARS-CoV-2 proteins and human proteins (BLAST version 2.10.0+, released 16 December 2019).	BLAST suggested: (BLASTX, RRID:SCR_001653)
A database of profile hidden Markov models (profile HMM) for every human protein sequence was built by iteratively searching a profile HMM for each human protein against all 2,949,581 protein sequences of viruses infecting human, downloaded from NCBI Virus (accessed 26 June 2020), using the jackhmmer module in HMMER 3.3 (maximum number of iterations = 5, E-value threshold = 30).	HMMER suggested: (Hmmer, RRID:SCR_005305)
For in silico cleavage of ORF1a and ORF1ab polyproteins, 20 amino acids long sequences before the fifteen cleavage sites were retrieved from the UniProtKB COVID-19 resource (covid-19.uniprot.org) using the accessions P0DTC1 and P0DTD1 for ORF1a and ORF1ab, respectively.	UniProtKB suggested: (UniProtKB, RRID:SCR_004426)
First, 166,891 macromolecular structures in the mmCIF format were loaded into a Python environment (Python 3.7.6).	Python suggested: (IPython, RRID:SCR_001658)
Missing atoms in these structures were filled with complete_pdb function in a MODELLER python package (version 9.24, released 6 April 2020) (Webb and Sali, 2016).	MODELLER suggested: (MODELLER, RRID:SCR_008395)
In the amino acid sequence retrieved from the mkdssp output, such gaps were filled with ‘X’ so that BLASTP program can more accurately align the sequence from the protein structure to the original protein sequence.	BLASTP suggested: (BLASTP, RRID:SCR_001010)
Predicted protein structures of SARS-CoV-2 proteome were retrieved from three different sources: the SWISS-MODEL repository for SARS-CoV-2 proteome (Waterhouse et al., 2018) (swissmodel.expasy.org/repository/species/2697049) (accessed 15 July 2020), SARS-CoV-2 structure modeling results from C-I-TASSER pipeline (Huang et al., 2020) (zhanglab.ccmb.med.umich.edu/COVID-19/) (released 6 May 2020), and SARS-CoV-2 structure modeling results from RaptorX pipeline and refinement of Google’s AlphaFold SARS-CoV-2 protein structure models (Heo and Feig, 2020) (github.com/feiglab/sars-cov-2-proteins) (accessed 16 July 2020).	RaptorX suggested: (RaptorX, RRID:SCR_018118)
Also, only SARS-CoV-2 and human peptide pairs with significant similarity scores (about 20,000 pairs for both MHC classes) were subjected to MHC-binding predictions to further reduce the computation time. pVACtools version 1.5.9 (Hundal et al., 2020) was installed via Docker.	pVACtools suggested: None
The consensus SARS-CoV-2 protein sequences were given as an input file to the BepiPred 2.0 server.	BepiPred suggested: (BepiPred-2.0, RRID:SCR_018499)
For performing large numeric operations, NumJS version 0.15.1 was utilized.	NumJS suggested: None

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on pages 8 and 11. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

Molecular Mimicry Map (3M) of SARS-CoV-2: Prediction of potentially immunopathogenic SARS-CoV-2 epitopes via a novel immunoinformatic approach

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Immunoinformatics-Driven Design and In Silico Validation of a Multi Epitope Subunit Vaccine Targeting Norovirus

Fusion protein pan-sarbecovirus vaccines elicit broadly protective immune responses targeting Clade 1a, 1b, and 3 sarbecoviruses

Computational Design of a Multi-Epitope Peptide Vaccine Against the Opportunistic Fungus Aspergillus fumigatus in Lung Cancer Patients

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Immunoinformatics-Driven Design and In Silico Validation of a Multi Epitope Subunit Vaccine Targeting Norovirus

Fusion protein pan-sarbecovirus vaccines elicit broadly protective immune responses targeting Clade 1a, 1b, and 3 sarbecoviruses

Computational Design of a Multi-Epitope Peptide Vaccine Against the Opportunistic Fungus Aspergillus fumigatus in Lung Cancer Patients