Beyond Structure and Affinity: Context-Dependent Signals for de novo Binder Success

Çağlar Bozkurt

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

De novo protein binder design has advanced rapidly, yet most designs fail experimentally and current structure- and affinity-centred evaluation does not reliably predict which candidates will succeed. Here we show that biology-informed sequence features, derived from models trained on natural proteins, identify transferable and context-dependent associations with binder expression and binding that are not captured by structural scoring alone.

We re-analysed two public benchmarks—the Bits to Binders CAR-T CD20 competition (11,984 designs; expression, proliferation, and T cell function gates) and the Adaptyv EGFR competition (603 designs; expression and binding affinity)—using five biology-informed ML models predicting disorder, amyloidogenicity, topology, PTM sites, and protein classification. Every feature was tested at each gate with FDR-corrected statistics.

We identify three layers of signal. Transferable : lower aggregation propensity is the most robust cross-benchmark signal; PTM-site density recurs univariately but is partly length-confounded in EGFR. Architecture-dependent : topology, disorder, and disulfide-related descriptors are significant in both datasets but flip direction, consistent with the different requirements of CAR extracellular domains versus standalone binders. Context-specific : phosphorylation-related associations with CAR-T depletion and low-disorder dominance in EGFR binding are tied to individual assay or format contexts. In the CAR-T benchmark, stacking biology-informed filters raises the enrichment hit rate from 13.8% to 38.6% (2.8× lift) after controlling for known sequence-level predictors.

These results suggest that pre-synthesis screening of de novo binders may benefit from being multi-gate and context-aware, using biology-informed sequence descriptors not only to rank candidates but also to help flag likely failure modes earlier and reduce wasted synthesis and testing.

Version published to 10.64898/2026.04.13.718094 on bioRxiv
Apr 15, 2026

The Synthetic Epitope Atlas: High-Throughput Design and Validation of De Novo Antibody-Antigen Complexes

This article has 24 authors:
1. Nicholas Altieri
2. Joseph L. Harman
3. David Noble
4. Natasha Murakowska
5. Alexander Eng
6. Kerry L. McGowan
7. Davis Goodnight
8. Lucian DiPeso
9. Colleen Shikany
10. Emily Engelhart
11. Leah J. Homad
12. Miranda C. Lahman
13. Shyam Gandhi
14. Mackenzie Goodwin
15. Kendrick Herbst
16. Charles Lin
17. Margot McMurray
18. Juliana Barrett
19. Aditya A. Agarwal
20. James Harrang
21. Ryan O. Emerson
22. Randolph M. Lopez
23. David A. Younger
24. Adrian W. Lange
This article has no evaluationsLatest version Apr 18, 2026
Assessing State-Specific Accuracy of Cofolding Models for Kinases and GPCRs

This article has 4 authors:
1. Leon Obendorf
2. Niklas Piet Doering
3. Petra Knaus
4. Gerhard Wolber
This article has no evaluationsLatest version May 8, 2026
Redesign selective protein binders using contrastive decoding

This article has 2 authors:
1. Ziwei Xie
2. Jinbo Xu
This article has no evaluationsLatest version May 13, 2026

Beyond Structure and Affinity: Context-Dependent Signals for de novo Binder Success

Discuss this preprint

Listed in

Abstract

Article activity feed

The Synthetic Epitope Atlas: High-Throughput Design and Validation of De Novo Antibody-Antigen Complexes

Assessing State-Specific Accuracy of Cofolding Models for Kinases and GPCRs

Redesign selective protein binders using contrastive decoding

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Synthetic Epitope Atlas: High-Throughput Design and Validation of De Novo Antibody-Antigen Complexes

Assessing State-Specific Accuracy of Cofolding Models for Kinases and GPCRs

Redesign selective protein binders using contrastive decoding