Benchmarking and Experimental Validation of Machine Learning Strategies for Enzyme Engineering

Zishuo Zeng
Jiao Jin
Rufang Xu
Xiaozhou Luo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Enzyme-directed evolution increasingly relies on computational tools to prioritize mutations, yet their practical value is difficult to assess because kinetic data are often aggregated across heterogeneous assay conditions, inflating apparent generalization. Here we introduce EnzyArena, a curated benchmark in which kinetic parameters (kcat, Km and kcat/Km) are grouped by experimental campaign and measured under matched conditions, thereby minimizing confounders such as pH, temperature, reaction, and batch. Using EnzyArena, we systematically evaluate 20 models spanning four widely used strategies for guiding enzyme evolution: protein–ligand binding affinity prediction, protein stability prediction, zero-shot fitness prediction and enzyme kinetic parameter prediction. Across subsets derived from public databases and 25 independent mutagenesis datasets, most models exhibit weak, fragile or inconsistent correlation with catalytic activity. Kinetic-parameter predictors perform strongly on database-derived subsets but lose their advantage on independent datasets, whereas zero-shot predictors show more consistent generalization. A simple consensus of multiple zero-shot models further improves the precision of identifying beneficial mutants. We prospectively validated these findings in a wet-lab campaign (150 mutants) comparing random mutants, UniKP-prioritized mutants and ESM-1v-prioritized mutants (representing supervised kinetic-parameter prediction and zero-shot fitness prediction, respectively), where ESM-1v achieved the highest utility and UniKP underperformed the random baseline. Together, this study establishes realistic baselines for computational mutant prioritization and highlights consensus zero-shot strategies as a practical starting point for enzyme engineering.

Zero-shot fitness predictors are the most competitive and generalizable class, but still achieve only modest correlations on independent datasets. Notably, a simple consensus of multiple zero-shot models substantially improves the precision of identifying superior mutants and currently offers the most practical computational benefit for enzyme engineering. Our work establishes EnzyArena as a fair benchmark for enzyme activity prediction, defines realistic performance baselines for existing tools, and highlights consensus zero-shot strategies as the most useful computational tool in practice for enzyme-directed evolution.

Version published to 10.64898/2026.03.29.715152 on bioRxiv
Mar 30, 2026

Deep learning for enzyme kcat prediction: what works, what doesn't, and why?

This article has 1 author:
1. Liangzhen Zheng
This article has no evaluationsLatest version Apr 9, 2026
CombinGym: a benchmark platform for machine learning-assisted design of combinatorial protein variants

This article has 8 authors:
1. Yongcan Chen
2. Lihao Fu
3. Xuchao Lu
4. Wenzhuo Li
5. Yuan Gao
6. Yibo Wang
7. Zhicheng Ruan
8. Tong Si
This article has no evaluationsLatest version Mar 25, 2026
Systematic Benchmarking of Kinase Bioactivity Models Across Splitting Strategies and Protein Representations

This article has 1 author:
1. Joshua M. Abbott
This article has no evaluationsLatest version Apr 22, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Deep learning for enzyme kcat prediction: what works, what doesn't, and why?

CombinGym: a benchmark platform for machine learning-assisted design of combinatorial protein variants

Systematic Benchmarking of Kinase Bioactivity Models Across Splitting Strategies and Protein Representations