Benchmarking and Experimental Validation of Machine Learning Strategies for Enzyme Engineering

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Enzyme-directed evolution increasingly relies on computational tools to prioritize mutations, yet their practical value is difficult to assess because kinetic data are often aggregated across heterogeneous assay conditions, inflating apparent generalization. Here we introduce EnzyArena, a curated benchmark in which kinetic parameters (kcat, Km and kcat/Km) are grouped by experimental campaign and measured under matched conditions, thereby minimizing confounders such as pH, temperature, reaction, and batch. Using EnzyArena, we systematically evaluate 20 models spanning four widely used strategies for guiding enzyme evolution: protein–ligand binding affinity prediction, protein stability prediction, zero-shot fitness prediction and enzyme kinetic parameter prediction. Across subsets derived from public databases and 25 independent mutagenesis datasets, most models exhibit weak, fragile or inconsistent correlation with catalytic activity. Kinetic-parameter predictors perform strongly on database-derived subsets but lose their advantage on independent datasets, whereas zero-shot predictors show more consistent generalization. A simple consensus of multiple zero-shot models further improves the precision of identifying beneficial mutants. We prospectively validated these findings in a wet-lab campaign (150 mutants) comparing random mutants, UniKP-prioritized mutants and ESM-1v-prioritized mutants (representing supervised kinetic-parameter prediction and zero-shot fitness prediction, respectively), where ESM-1v achieved the highest utility and UniKP underperformed the random baseline. Together, this study establishes realistic baselines for computational mutant prioritization and highlights consensus zero-shot strategies as a practical starting point for enzyme engineering.

Zero-shot fitness predictors are the most competitive and generalizable class, but still achieve only modest correlations on independent datasets. Notably, a simple consensus of multiple zero-shot models substantially improves the precision of identifying superior mutants and currently offers the most practical computational benefit for enzyme engineering. Our work establishes EnzyArena as a fair benchmark for enzyme activity prediction, defines realistic performance baselines for existing tools, and highlights consensus zero-shot strategies as the most useful computational tool in practice for enzyme-directed evolution.

Article activity feed