STRUMP-I: Structure-based machine learning approach to pMHC-I binding prediction using force field energy features

Adam Voshall
Jeongjun Chae
Honglan Li
Junsu Ko
Woongyang Park
Eunjung Alice Lee
Yoonjoo Choi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The adaptive immune system monitors cellular integrity by recognizing short peptides from intracellular proteins presented on Major Histocompatibility Complex class I (MHC-I) molecules, collectively termed peptide-MHC complexes (pMHC), enabling detection of foreign or mutated proteins. With the rising importance of immunotherapies targeting neoantigens in cancers, the ability to accurately predict which peptides will bind to the diverse population of MHC alleles is critically important. Current computational methods for pMHC-I prediction fall broadly into sequence-based methods, which rely heavily on large training datasets, and structure-based methods that leverage structural modeling and energetics of pMHC binding. While sequence-based methods have been popularly used, their performance is dependent on the size and quality of training data. On the other hands, while structure-based approaches can generalize better across diverse MHC alleles, they traditionally depend on identifying a single global minimum energy conformation, an assumption that often fails due to the inherent binding promiscuity of MHC-I molecules. To address these limitations, we developed a STRUMP-I (STRUcture-based pMHC Prediction (for class I)), a novel pMHC binding prediction tool that directly leverages a broad set of force-field-derived energy terms as machine-learning features. STRUMP-I achieves performance comparable to state-of-the-art sequence-based models while significantly outperforming them on MHC alleles with limited representation in training data. Furthermore, STRUMP-I demonstrates strong synergy when integrated with sequence-based methods, notably enhancing prediction precision. The robustness and generalizability of STRUMP-I were confirmed by evaluating its predictive performance on independent, previously unseen datasets, including an experimentally validated cancer neoantigen dataset. This combined approach advances our capability to reliably identify clinically relevant neoantigen targets. The source code and trained models are available at https://github.com/yoonjoolab/STRUMP-I

Version published to 10.1101/2025.09.03.674126 on bioRxiv
Sep 8, 2025

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

This article has 4 authors:
1. Tayyip Topuz
2. Zeki Erdem
3. Halil Bisgin
4. E. Demet Akten
This article has no evaluationsLatest version Feb 2, 2026
Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

This article has 3 authors:
1. Brandon Yee
2. Maximilian Rutkowski
3. Wilson Collins
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery