DyAb: sequence-based antibody design and property prediction in a low-data regime

Joshua Yao-Yu Lin
Jennifer L. Hofmann
Andrew Leaver-Fay
Wei-Ching Liang
Stefania Vasilaki
Edith Lee
Pedro O. Pinheiro
Natasa Tagasovska
James R. Kiefer
Yan Wu
Franziska Seeger
Richard Bonneau
Vladimir Gligorijevic
Andrew Watkins
Kyunghyun Cho
Nathan C. Frey

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Protein therapeutic design and property prediction are frequently hampered by data scarcity. Here we propose a new model, DyAb, that addresses these issues by leveraging a pair-wise representation to predict differences in protein properties, rather than absolute values. DyAb is built on top of a pre-trained protein language model and achieves a Spearman rank correlation of up to 0.85 on binding affinity prediction across molecules targeting three different antigens (EGFR, IL-6, and an internal target), given as few as 100 training data. We employ DyAb in two design contexts: as a ranking model to score combinations of known mutations, and combined with a genetic algorithm to generate new sequences. Our method consistently generates novel antibody candidates with high binding rates, including designs that improve on the binding affinity of the lead molecule by more than ten-fold. DyAb represents a powerful tool for engineering therapeutic protein properties in low data regimes common in early-stage drug development.

Arcadia Science
Feb 22, 2025

Supplementary Fig. S

what do the different colors signify in the plots?

Read the original source
Arcadia Science
Feb 22, 2025

incorporating only mutations found in previously stable sequence

does that mean that the GA will not consider mutations not encountered in the binding affinity datasets?

Read the original source
Arcadia Science
Feb 22, 2025

Designs express and bind at consistently high rates (> 85%), comparable to that of singlepoint mutants.

it would be interesting to see a naive control, i.e. what is the average expression and binding rate if you just make N point mutations at random?

Read the original source
Arcadia Science
Feb 22, 2025

66 pM, exhibiting a near 50-fold improvement

Very impressive!

Read the original source
Arcadia Science
Feb 22, 2025

DyAb performance on the regression task for design sets are shown in Supplementary Fig.S3

from S3a, it looks like DyAb is not very predictive with the lead A dataset, but performs much better on the others even though they have equal/fewer data points. Any idea on why this is?

Read the original source
Version published to 10.1101/2025.01.28.635353 on bioRxiv
Feb 2, 2025

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

This article has 3 authors:
1. Brandon Yee
2. Maximilian Rutkowski
3. Wilson Collins
This article has no evaluationsLatest version Jan 28, 2026
Drug discovery guided by maximum drug likeness

This article has 3 authors:
1. Hao-Yu Zhu
2. Lu Xu
3. Wei Shi
This article has no evaluationsLatest version Dec 31, 2025
Parameter-Efficient Adaptation of Large Language Models for Drug-Target Affinity Modeling in Drug Discovery

This article has 1 author:
1. Virendra Singh Kaira
This article has no evaluationsLatest version Jan 29, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

Drug discovery guided by maximum drug likeness

Parameter-Efficient Adaptation of Large Language Models for Drug-Target Affinity Modeling in Drug Discovery