Structure-based Predictions of Conformational B Cell Epitopes by Protein Language Model and Deep Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mapping conformational B-cell epitopes remains a central challenge for antibody discovery: experiments are costly and most computational tools trained on generic protein-protein interfaces transfer poorly to -antibody-antigen recognition. We introduce a patch-centric framework that predicts epitopes directly on antigen structures. Each surface "patch" is defined as a triad of neighboring residues, capturing the smallest local unit that encodes both shape and chemistry. We evaluate two classifiers: (i) a protein language model (PLM) approach that averages ESM-2 embeddings over each triad and scores them with a small multilayer perceptron, and (ii) a convolutional baseline that consumes a hand-crafted 15×20 feature matrix summarizing amino-acid identity, secondary structure, solvent accessibility, and shape index. Trained with five-fold cross-validation on 1,151 AbDb antibody-antigen complexes, the PLM model markedly outperforms the CNN at the patch level (e.g., F1≈0.986, ROC-AUC≈0.998). Aggregating patch scores to residues with an ensemble over all folds yields robust residue-wise performance, surpassing the CNN (ROC-AUC 0.689±0.072 vs. 0.548±0.018). Against widely used sequence- and structure-based tools on AbDb, our PLM achieves the best summary metrics (ROC-AUC 0.67, PR-AUC 0.56) with full coverage of all antigens. On five external complexes unseen during development, the model generalizes well (ROC-AUC 0.663) and accurately localizes binding regions qualitatively. The method converts PLM representations into interpretable epitope likelihood maps, offering a practical aid for antigen prioritization, antibody engineering, and vaccine design.

Article activity feed