H3BERTa: A CDR-H3 specific language model for antibody repertoire analysis

Chiara Rodella
Thomas Lemmin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Antibodies are central to immune defense and therapeutic design, yet predicting which sequences confer functional activity remains challenging. Deep learning models trained on full variable regions often struggle due to sparse experimental data, signal dilution from conserved framework residues, and the extreme diversity of hypervariable loops. The heavy-chain complementarity-determining region 3 (CDR-H3) is the most variable segment shaping antigen specificity and driving immune diversity. Here, we present H3BERTa, a transformer-based language model trained solely on CDR-H3 sequences, to test whether this short region alone encodes enough biologically meaningful information. H3BERTa embeddings recapitulate biologically relevant sequence features, including J-gene usage and inferred B-cell maturation state. We further show that pseudo-perplexity profiles can be used to analyze repertoires, distinguishing healthy from HIV-1–derived sequences and suggesting measurable immune response signatures. Finally, these embeddings can support classifiers for broadly neutralizing antibodies (bnAbs) using limited labeled sequences, demonstrating their potential for accelerating antibody discovery. Together, our results indicate that the CDR-H3 region alone encodes a rich immunological signature, which H3BERTa robustly captures, providing a focused computational tool for analyzing repertoire diversity and informing antibody engineering.

Version published to 10.1101/2025.11.03.686198 on bioRxiv
Nov 5, 2025

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
LinkerMind: An Interpretable, Mechanism-Informed Deep Learning Framework for the De Novo Design of Antibody Drug Conjugate Linkers

This article has 1 author:
1. Martins Otun
This article has no evaluationsLatest version Dec 19, 2025
Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences

This article has 5 authors:
1. Radim Krupička
2. Mariana Komárková
3. Bohuslav Dvorský
4. Kateřina Kollinová
5. Ondřej Klempíř
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

LinkerMind: An Interpretable, Mechanism-Informed Deep Learning Framework for the De Novo Design of Antibody Drug Conjugate Linkers

Benchmarking Genomic Foundation Models for Gene Fusion Detection from DNA Sequences