Dual-encoder contrastive learning accelerates enzyme discovery

Jason W. Rocks
Dat P. Truong
Dmitrij Rappoport
Samuel Maddrell-Mander
Daniel A. Martin-Alarcon
Toni M. Lee
Steven Crossan
Joshua E. Goldford

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The ability to engineer enzymes for desired reactions is a cornerstone of modern biotechnology, yet identifying suitable starting proteins remains a critical bottleneck. Although contrastive learning offers a compelling computational approach for enzyme discovery, these models have yet to be implemented at scale or proven effective in real-world experimental settings. Here, we present Horizyn-1, a computationally efficient deep learning framework that enables large-scale reaction-to-enzyme recommendation validated through comprehensive experimental testing. Leveraging a combination of reaction fingerprints and protein language models, we trained Horizyn-1 on millions of reaction-enzyme pairs to achieve state-of-the-art performance, recovering an enzyme with correct activity within the top 100 hits for over 75% of test reactions. We experimentally validate Horizyn-1 across three enzyme discovery scenarios: identifying enzymes for orphan reactions, predicting enzyme promiscuity for both characterized and uncharacterized enzymes, and discovering enzymes for non-natural biochemical reactions including lysine-driven transaminations that enable efficient synthesis of non-canonical amino acids. On underrepresented reaction classes, we find that fine-tuning with fewer than 10 additional reactions can dramatically improve performance. Furthermore, a logarithmic scaling of model performance with training dataset size suggests continued improvement with larger and more diverse reaction datasets. Horizyn-1 addresses the critical bottleneck of sourcing initial enzymes for optimization campaigns, enabling efficient and scalable in silico screening for enzymes with desired activities and promising to accelerate future efforts in biocatalysis and metabolic engineering.

Version published to 10.1101/2025.08.21.671639 on bioRxiv
Aug 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed