Structure-based Predictions of Conformational B Cell Epitopes by Protein Language Model and Deep Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Mapping conformational B-cell epitopes remains a central challenge for antibody discovery: experiments are costly and most computational tools trained on generic protein-protein interfaces transfer poorly to -antibody-antigen recognition. We introduce a patch-centric framework that predicts epitopes directly on antigen structures. Each surface "patch" is defined as a triad of neighboring residues, capturing the smallest local unit that encodes both shape and chemistry. We evaluate two classifiers: (i) a protein language model (PLM) approach that averages ESM-2 embeddings over each triad and scores them with a small multilayer perceptron, and (ii) a convolutional baseline that consumes a hand-crafted 15×20 feature matrix summarizing amino-acid identity, secondary structure, solvent accessibility, and shape index. Trained with five-fold cross-validation on 1,151 AbDb antibody-antigen complexes, the PLM model markedly outperforms the CNN at the patch level (e.g., F1≈0.986, ROC-AUC≈0.998). Aggregating patch scores to residues with an ensemble over all folds yields robust residue-wise performance, surpassing the CNN (ROC-AUC 0.689±0.072 vs. 0.548±0.018). Against widely used sequence- and structure-based tools on AbDb, our PLM achieves the best summary metrics (ROC-AUC 0.67, PR-AUC 0.56) with full coverage of all antigens. On five external complexes unseen during development, the model generalizes well (ROC-AUC 0.663) and accurately localizes binding regions qualitatively. The method converts PLM representations into interpretable epitope likelihood maps, offering a practical aid for antigen prioritization, antibody engineering, and vaccine design.