PHbinder and PSGM: A Cascaded Framework for Epitope Prediction and HLA-I Allele Identification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The presentation of antigens by Human Leukocyte Antigen class I (HLA-I) molecules is a cornerstone of adaptive immunity. Although existing prediction tools such as NetMHCpan and MHCflurry exhibit high accuracy in predicting binding affinity between peptides and specific HLA-I alleles, they are constrained to a preset set of alleles. Consequently, they can neither directly determine whether a peptide is an epitope nor provide a holistic binding profile across the entire HLA-I allelic landscape. To over-come these challenges, we introduce two synergistic models: PH-binder (Peptide-HLA-I Binder) and PSGM (Pseudo Sequence Generation and Mapping). PHbinder integrates features from a fine-tuned ESM2 language model with Low-Rank Adaptation (LoRA), processing them through parallel CNN and Transformer branches to capture local and global patterns, which are then fused using a Cross-Multi-Head Attention mechanism. In the epitope prediction task, PHbinder achieved an accuracy of prediction of 85. 12%, significantly exceeding established benchmark models. Complementing this, the PSGM model employs a Generative Adversarial Network(GAN) architecture to generate the corresponding HLA-I pseudo sequences. These are then mapped to the known alleles using a Hamming distance-based nearestneighbor search. PSGM achieved 49.26% average coverage in its predictions of the Top-50 alleles. Furthermore, orthogonal validation with MHCflurry revealed that 63% of the highest affinity binding partners within its Top-50 list were new experimentally unverified HLA-I alleles. Together, PHbinder and PSGM establish a cascaded framework that enables a precise “Peptide → Epitope Determination → HLA-I Alleles List” pipeline. This work accelerates the screening of immunogenic epitopes and provides a powerful upstream preprocessor for traditional prediction tools.