Deep learning for enzyme kcat prediction: what works, what doesn't, and why?

Liangzhen Zheng

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep learning models for turnover number (kcat) prediction are widely reported in recent years, yet their practical utility in enzyme engineering remains unclear. We benchmark five state-of-the-art models using multi-dimensional tests beyond standard regression metrics. Training data are severely biased toward oxidoreductases and ATP-like substrates, with sparse coverage of sequence and chemical space. Across diverse independent test sets—including temporal hold-out data, novel enzymes, deep mutational scans, and enzyme-inhibitor pairs—all models failed to predict absolute kcat accurately. Moreover, their ranking capability collapsed for sequences dissimilar to the training data. Crucially, models exhibit a striking asymmetry: they respond to active-site disruptions but ignore substrate chemistry, unable to distinguish substrates from products or inhibitors. Experimental validation on 98 PjxA xylanase mutants proposed by these models confirms low predictive utility (global correlations <0.3, positive rate <=10%. These findings indicate current models function as pattern-recognition-like predictors rather than mechanism-aware predictors. Our findings reveal that current kcat predictors lack the chemical awareness required for reliable industrial application, highlighting a critical gap in the field.

Version published to 10.21203/rs.3.rs-9154046/v1 on Research Square
Apr 9, 2026

Unsupervised protein language models learn patterns of enzyme function

This article has 7 authors:
1. Matthew Penner
2. Michal Lihan
3. Hannes Bormke
4. Peter Nix
5. Hanna Moscho
6. Paul Dupree
7. Florian Hollfelder
This article has no evaluationsLatest version Apr 23, 2026
Systematic Benchmarking of Kinase Bioactivity Models Across Splitting Strategies and Protein Representations

This article has 1 author:
1. Joshua M. Abbott
This article has no evaluationsLatest version Apr 22, 2026
Methods for Continuous-Valued Training Data Generation from Genome-Scale Metabolic Models: Partial-Inhibition FBA with Mixed Essentiality Sampling, Applied to ESKAPE Drug Target Curation

This article has 1 author:
1. Byeongsoo Kang
This article has no evaluationsLatest version Apr 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unsupervised protein language models learn patterns of enzyme function

Systematic Benchmarking of Kinase Bioactivity Models Across Splitting Strategies and Protein Representations

Methods for Continuous-Valued Training Data Generation from Genome-Scale Metabolic Models: Partial-Inhibition FBA with Mixed Essentiality Sampling, Applied to ESKAPE Drug Target Curation