Structure-based Predictions of Conformational B Cell Epitopes by Protein Language Model and Deep Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Mapping conformational B-cell epitopes remains a central challenge for antibody discovery: experiments are costly and most computational tools trained on generic protein–protein interfaces transfer poorly to antibody–antigen recognition. We introduce a patch-centric framework that predicts epitopes directly on antigen structures. Each surface “patch” is defined as a triad of neighboring residues, capturing the smallest local unit that encodes both shape and chemistry. We evaluate two classifiers: (i) a protein language model (PLM) approach that averages ESM-2 embeddings over each triad and scores them with a small multilayer perceptron [1], and (ii) a convolutional baseline that consumes a hand-crafted 15 × 20 feature matrix summarizing amino-acid identity, secondary structure, solvent accessibility, and shape index. Trained with five-fold cross-validation on 1,151 AbDb antibody–antigen complexes, the PLM model markedly outperforms the CNN at the patch level (e.g., F1 ≈ 0.986 , ROC–AUC ≈ 0.998 ). Aggregating patch scores to residues with an ensemble over all folds yields robust residue-wise performance, surpassing the CNN (ROC–AUC 0.689 ± 0.072 vs. 0.548 ± 0.018 ). Against widely used sequence- and structure-based tools on AbDb, our PLM achieves the best summary metrics (ROC–AUC 0.67 , PR– AUC 0.56 ) with full coverage of all antigens. On five external complexes unseen during development, the model generalizes well (ROC–AUC 0.663 ) and accurately localizes binding regions qualitatively. The method converts PLM representations into interpretable epitope likelihood maps, offering a practical aid for antigen prioritization, antibody engineering, and vaccine design.