Molecular Mimicry Map (3M) of SARS-CoV-2: Prediction of potentially immunopathogenic SARS-CoV-2 epitopes via a novel immunoinformatic approach

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Currently, more than 33 million peoples have been infected by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and more than a million people died from coronavirus disease 2019 (COVID-19), a disease caused by the virus. There have been multiple reports of autoimmune and inflammatory diseases following SARS-CoV-2 infections. There are several suggested mechanisms involved in the development of autoimmune diseases, including cross-reactivity (molecular mimicry). A typical workflow for discovering cross-reactive epitopes (mimotopes) starts with a sequence similarity search between protein sequences of human and a pathogen. However, sequence similarity information alone is not enough to predict cross-reactivity between proteins since proteins can share highly similar conformational epitopes whose amino acid residues are situated far apart in the linear protein sequences. Therefore, we used a hidden Markov model-based tool to identify distant viral homologs of human proteins. Also, we utilized experimentally determined and modeled protein structures of SARS-CoV-2 and human proteins to find homologous protein structures between them. Next, we predicted binding affinity (IC50) of potentially cross-reactive T-cell epitopes to 34 MHC allelic variants that have been associated with autoimmune diseases using multiple prediction algorithms. Overall, from 8,138 SARS-CoV-2 genomes, we identified 3,238 potentially cross-reactive B-cell epitopes covering six human proteins and 1,224 potentially cross-reactive T-cell epitopes covering 285 human proteins. To visualize the predicted cross-reactive T-cell and B-cell epitopes, we developed a web-based application “Molecular Mimicry Map (3M) of SARS-CoV-2” (available at https://ahs2202.github.io/3M/ ). The web application enables researchers to explore potential cross-reactive SARS-CoV-2 epitopes alongside custom peptide vaccines, allowing researchers to identify potentially suboptimal peptide vaccine candidates or less ideal part of a whole virus vaccine to design a safer vaccine for people with genetic and environmental predispositions to autoimmune diseases. Together, the computational resources and the interactive web application provide a foundation for the investigation of molecular mimicry in the pathogenesis of autoimmune disease following COVID-19.

Article activity feed

  1. SciScore for 10.1101/2020.11.12.344424: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    A BLAST database of the 20,595 human proteins was made using the makeblastdb module, and the 93,413 SARS-CoV-2 proteins were searched against the database to find potentially homologous regions between SARS-CoV-2 proteins and human proteins (BLAST version 2.10.0+, released 16 December 2019).
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)
    A database of profile hidden Markov models (profile HMM) for every human protein sequence was built by iteratively searching a profile HMM for each human protein against all 2,949,581 protein sequences of viruses infecting human, downloaded from NCBI Virus (accessed 26 June 2020), using the jackhmmer module in HMMER 3.3 (maximum number of iterations = 5, E-value threshold = 30).
    HMMER
    suggested: (Hmmer, RRID:SCR_005305)
    For in silico cleavage of ORF1a and ORF1ab polyproteins, 20 amino acids long sequences before the fifteen cleavage sites were retrieved from the UniProtKB COVID-19 resource (covid-19.uniprot.org) using the accessions P0DTC1 and P0DTD1 for ORF1a and ORF1ab, respectively.
    UniProtKB
    suggested: (UniProtKB, RRID:SCR_004426)
    First, 166,891 macromolecular structures in the mmCIF format were loaded into a Python environment (Python 3.7.6).
    Python
    suggested: (IPython, RRID:SCR_001658)
    Missing atoms in these structures were filled with complete_pdb function in a MODELLER python package (version 9.24, released 6 April 2020) (Webb and Sali, 2016).
    MODELLER
    suggested: (MODELLER, RRID:SCR_008395)
    In the amino acid sequence retrieved from the mkdssp output, such gaps were filled with ‘X’ so that BLASTP program can more accurately align the sequence from the protein structure to the original protein sequence.
    BLASTP
    suggested: (BLASTP, RRID:SCR_001010)
    Predicted protein structures of SARS-CoV-2 proteome were retrieved from three different sources: the SWISS-MODEL repository for SARS-CoV-2 proteome (Waterhouse et al., 2018) (swissmodel.expasy.org/repository/species/2697049) (accessed 15 July 2020), SARS-CoV-2 structure modeling results from C-I-TASSER pipeline (Huang et al., 2020) (zhanglab.ccmb.med.umich.edu/COVID-19/) (released 6 May 2020), and SARS-CoV-2 structure modeling results from RaptorX pipeline and refinement of Google’s AlphaFold SARS-CoV-2 protein structure models (Heo and Feig, 2020) (github.com/feiglab/sars-cov-2-proteins) (accessed 16 July 2020).
    RaptorX
    suggested: (RaptorX, RRID:SCR_018118)
    Also, only SARS-CoV-2 and human peptide pairs with significant similarity scores (about 20,000 pairs for both MHC classes) were subjected to MHC-binding predictions to further reduce the computation time. pVACtools version 1.5.9 (Hundal et al., 2020) was installed via Docker.
    pVACtools
    suggested: None
    The consensus SARS-CoV-2 protein sequences were given as an input file to the BepiPred 2.0 server.
    BepiPred
    suggested: (BepiPred-2.0, RRID:SCR_018499)
    For performing large numeric operations, NumJS version 0.15.1 was utilized.
    NumJS
    suggested: None

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on pages 8 and 11. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.