Antibody affinity engineering using antibody repertoire data and machine learning

Lena Erlach
Simon Friedensohn
Daniel Neumeier
Derek M. Mason
Sai T. Reddy

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Advanced antibody discovery and engineering workflows take advantage of the combination of high-throughput screening, deep sequencing and machine learning (ML). Most high-throughput methods, however, lack the resolution to provide absolute affinity values of antibody-antigen interactions, limiting their utility for precise engineering of binding kinetics. In this study, we utilize antibody repertoire data, affinity characterization and ML for antibody affinity engineering. Leveraging natural antibody sequence information from repertoires of immunized mice, we identified and experimentally measured affinities for 35 antigen-specific variants. Supervised ML models trained on these sequences achieved remarkable accuracy in predicting affinity, despite the limited dataset size. We utilized the trained ML model to in silico -design eight synthetic antibody variants, of which seven exhibited the desired affinities. Our study illustrates the potential of this streamlined and efficient approach for precise engineering of the affinity of antibodies while reducing extensive experimental screening.

Arcadia Science
May 2, 2025

To visually examine the sequence-function relationship of the characterized antibody variants, both a network plot and a phylogenetic tree were generated

Given that your results clearly show a strong relationship between sequence similarity and binding affinity (in both the phylogenetic tree and network analysis), did you consider alternative strategies for sequence encoding? In particular those that might capture some of this evolutionary signal? For example including additional features derived from the phylogenetic tree, network-based distances, or embeddings from protein language models (like ESM)?

These kinds of features might be especially valuable in a small-sample setting like this one and could further boost the predictive power of your models. Very nice study! Great to see creative and effective ways to leverage the power …

To visually examine the sequence-function relationship of the characterized antibody variants, both a network plot and a phylogenetic tree were generated

Given that your results clearly show a strong relationship between sequence similarity and binding affinity (in both the phylogenetic tree and network analysis), did you consider alternative strategies for sequence encoding? In particular those that might capture some of this evolutionary signal? For example including additional features derived from the phylogenetic tree, network-based distances, or embeddings from protein language models (like ESM)?

These kinds of features might be especially valuable in a small-sample setting like this one and could further boost the predictive power of your models. Very nice study! Great to see creative and effective ways to leverage the power of small experimental datasets for protein function prediction.

Read the original source
Version published to 10.1101/2025.01.10.632313 on bioRxiv
Jan 14, 2025

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Reinforcement Learning-Augmented ProteinMPNN Improve the Binding Affinity of TNFR1-Targeting Minibinders

This article has 10 authors:
1. Zigong Wei
2. Lin Wei
3. Zhiyong Wu
4. Yang Hu
5. Yihe Fang
6. Miaomiao Geng
7. Banbin Xing
8. Jun Weng
9. Song Liu
10. Ke Ming
This article has no evaluationsLatest version Jan 21, 2026
Parameter-Efficient Adaptation of Large Language Models for Drug-Target Affinity Modeling in Drug Discovery

This article has 1 author:
1. Virendra Singh Kaira
This article has no evaluationsLatest version Jan 29, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Reinforcement Learning-Augmented ProteinMPNN Improve the Binding Affinity of TNFR1-Targeting Minibinders

Parameter-Efficient Adaptation of Large Language Models for Drug-Target Affinity Modeling in Drug Discovery