Comprehensive evaluation and interpretative insights of peptide-HLA binding prediction tools using explainable artificial intelligence
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate prediction of human leukocyte antigen class I (HLA-I) peptide binding is pivotal for immunological research, including vaccine development and immunotherapy. However, challenges such as tool performance variability, limited interpretability, and dataset quality hinder the broader applicability of existing models. Here, we conduct a comprehensive benchmarking of 17 prediction tools using a rigorously curated dataset of over 290,000 peptides across 44 HLA-I alleles. We assess model accuracy, robustness, and interpretability, incorporating explainability methods such as SHAP and LIME to elucidate prediction mechanisms. Our analysis reveals significant performance differences, with self-attention-based models (e.g., STMHCpan and BigMHC) demonstrating superior accuracy, while CapsNet-MHC_AN, a capsule network model, offers competitive performance. Models trained on eluted ligand data outperformed those using binding affinity data, highlighting the importance of high-quality training datasets. Ensemble and multi-algorithm strategies further enhanced prediction reliability. These findings emphasize the need for continued innovation in model design, the integration of diverse datasets, and the exploration of structural predictors, providing a framework for developing more accurate, interpretable, and clinically relevant HLA-I peptide binding prediction tools.