Structure-based Predictions of Conformational B Cell Epitopes by Protein Language Model and Deep Learning

Yuhao Zhang
Zhaoqian Su
Felipe Vilicich
Xiaohan Kuang
Yunchao Liu
Grace Zhang
Yinghao Wu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Mapping conformational B-cell epitopes remains a central challenge for antibody discovery: experiments are costly and most computational tools trained on generic protein–protein interfaces transfer poorly to antibody–antigen recognition. We introduce a patch-centric framework that predicts epitopes directly on antigen structures. Each surface “patch” is defined as a triad of neighboring residues, capturing the smallest local unit that encodes both shape and chemistry. We evaluate two classifiers: (i) a protein language model (PLM) approach that averages ESM-2 embeddings over each triad and scores them with a small multilayer perceptron [1], and (ii) a convolutional baseline that consumes a hand-crafted 15 × 20 feature matrix summarizing amino-acid identity, secondary structure, solvent accessibility, and shape index. Trained with five-fold cross-validation on 1,151 AbDb antibody–antigen complexes, the PLM model markedly outperforms the CNN at the patch level (e.g., F1 ≈ 0.986 , ROC–AUC ≈ 0.998 ). Aggregating patch scores to residues with an ensemble over all folds yields robust residue-wise performance, surpassing the CNN (ROC–AUC 0.689 ± 0.072 vs. 0.548 ± 0.018 ). Against widely used sequence- and structure-based tools on AbDb, our PLM achieves the best summary metrics (ROC–AUC 0.67 , PR– AUC 0.56 ) with full coverage of all antigens. On five external complexes unseen during development, the model generalizes well (ROC–AUC 0.663 ) and accurately localizes binding regions qualitatively. The method converts PLM representations into interpretable epitope likelihood maps, offering a practical aid for antigen prioritization, antibody engineering, and vaccine design.

Version published to 10.1101/2025.10.29.685313 on bioRxiv
Oct 30, 2025

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

This article has 4 authors:
1. Tayyip Topuz
2. Zeki Erdem
3. Halil Bisgin
4. E. Demet Akten
This article has no evaluationsLatest version Feb 2, 2026
LinkerMind: An Interpretable, Mechanism-Informed Deep Learning Framework for the De Novo Design of Antibody Drug Conjugate Linkers

This article has 1 author:
1. Martins Otun
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

LinkerMind: An Interpretable, Mechanism-Informed Deep Learning Framework for the De Novo Design of Antibody Drug Conjugate Linkers