MimicryDB-Auto: Structural Validation Reveals the Inadequacy of Sequence-Based Molecular Mimicry Screening in Autoimmunity

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Molecular mimicry — structural or sequence similarity between pathogen-derived and host self-peptides sufficient to trigger cross-reactive immune responses — has been proposed as a mechanism of autoimmune triggering across rheumatoid arthritis, systemic lupus erythematosus, ankylosing spondylitis, systemic sclerosis, antiphospholipid syndrome, dermatomyositis, and Guillain-Barré syndrome. Computational identification of mimicry candidates has historically relied on sequence-based metrics, resting on the untested assumption that sequence similarity predicts structural similarity at the MHC-presented peptide level. We present MimicryDB-Auto, to our knowledge the first curated, labelled multi-pathogen dataset integrating MHC epitope prediction, sequence alignment, and atomic structural validation at the individual epitope level across both MHC class I and II presentations, comprising 399 pathogen-host peptide pairs spanning 32 organisms constructed through a reproducible seven-step pipeline. Following structural validation using TM-align with RMSD < 2.0 Å, 262 pairs were classified as confirmed unbound structural mimics and 137 as non-mimics. Within the confirmed mimic pool, sequence identity explained at most 1.6% of variance in structural RMSD at both the 2.0 Å threshold (r = −0.127, p = 0.036, n = 272) and the stricter 1.0 Å threshold (r = −0.046, p = 0.562, n = 159) — a relationship of no practical predictive utility across threshold definitions. A Random Forest classifier trained exclusively on sequence and immunological features achieved AUC-ROC = 0.958 (95% CI: 0.886–0.999), confirming a multivariate sequence signal exists but is insufficient as a standalone substitute for structural validation. Cross-pairing validation further confirmed that 99.2% of structurally equivalent non-matched pairs had zero detectable sequence similarity, quantifying the scope of sequence-dissimilar structural mimicry invisible to conventional screening. All structural comparisons were performed on unbound peptide conformations, representing a proxy for MHC-presented structure rather than direct immunological validation. MimicryDB-Auto and the complete pipeline are publicly available at https://github.com/minbaku/molecular-mimicry-RA-pipeline.

Article activity feed