PHbinder and PSGM: A Cascaded Framework for Epitope Prediction and HLA-I Allele Identification

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The presentation of antigens by Human Leukocyte Antigen class I (HLA-I) molecules is a cornerstone of adaptive immunity. Although existing prediction tools such as NetMHCpan and MHCflurry exhibit high accuracy in predicting binding affinity between peptides and specific HLA-I alleles, they are constrained to a preset set of alleles. Consequently, they can neither directly determine whether a peptide is an epitope nor provide a holistic binding profile across the entire HLA-I allelic landscape. To over-come these challenges, we introduce two synergistic models: PH-binder (Peptide-HLA-I Binder) and PSGM (Pseudo Sequence Generation and Mapping). PHbinder integrates features from a fine-tuned ESM2 language model with Low-Rank Adaptation (LoRA), processing them through parallel CNN and Transformer branches to capture local and global patterns, which are then fused using a Cross-Multi-Head Attention mechanism. In the epitope prediction task, PHbinder achieved an accuracy of prediction of 85. 12%, significantly exceeding established benchmark models. Complementing this, the PSGM model employs a Generative Adversarial Network(GAN) architecture to generate the corresponding HLA-I pseudo sequences. These are then mapped to the known alleles using a Hamming distance-based nearestneighbor search. PSGM achieved 49.26% average coverage in its predictions of the Top-50 alleles. Furthermore, orthogonal validation with MHCflurry revealed that 63% of the highest affinity binding partners within its Top-50 list were new experimentally unverified HLA-I alleles. Together, PHbinder and PSGM establish a cascaded framework that enables a precise “Peptide → Epitope Determination → HLA-I Alleles List” pipeline. This work accelerates the screening of immunogenic epitopes and provides a powerful upstream preprocessor for traditional prediction tools.

Article activity feed