Deep learning for enzyme kcat prediction: what works, what doesn't, and why?

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep learning models for turnover number (kcat) prediction are widely reported in recent years, yet their practical utility in enzyme engineering remains unclear. We benchmark five state-of-the-art models using multi-dimensional tests beyond standard regression metrics. Training data are severely biased toward oxidoreductases and ATP-like substrates, with sparse coverage of sequence and chemical space. Across diverse independent test sets—including temporal hold-out data, novel enzymes, deep mutational scans, and enzyme-inhibitor pairs—all models failed to predict absolute kcat accurately. Moreover, their ranking capability collapsed for sequences dissimilar to the training data. Crucially, models exhibit a striking asymmetry: they respond to active-site disruptions but ignore substrate chemistry, unable to distinguish substrates from products or inhibitors. Experimental validation on 98 PjxA xylanase mutants proposed by these models confirms low predictive utility (global correlations <0.3, positive rate <=10%. These findings indicate current models function as pattern-recognition-like predictors rather than mechanism-aware predictors. Our findings reveal that current kcat predictors lack the chemical awareness required for reliable industrial application, highlighting a critical gap in the field.

Article activity feed