Robust Prediction of Enzyme Variant Kinetics with RealKcat
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Predicting enzyme kinetics directly from sequence remains a central challenge in computational biology, particularly in resolving the effects of mutations at catalytically essential residues. Existing models frequently overlook the functional consequences of such perturbations, often defaulting to wild-type predictions even in cases of substantial activity loss, thereby limiting their reliability for enzyme design and mechanistic inference. Here, we introduce RealKcat, a machine learning framework trained on KinHub-27k, a rigorously curated dataset of 27,176 experimentally reported enzyme–substrate entries consolidated from BRENDA, SABIO-RK, and UniProt and verified across 2,158 primary sources. To ensure biochemical realism, kinetic parameters were collapsed into order-of-magnitude bins, enabling predictions that are tolerant to experimental noise yet sensitive to functional shifts. RealKcat integrates ESM embeddings for enzyme sequences with ChemBERTa embeddings of affiliated substrate, producing a unified feature space of the chemical conversion that supports robust multi-class classification of both catalytic turnover ( 𝑘 𝑐𝑎𝑡 ) and substrate affinity ( K M ). Across cross-validation, hold-out, out-of-distribution, and few-shot evaluations—including a dense mutational landscape of alkaline phosphatase (PafA)—RealKcat consistently capturead the direction and magnitude of mutation-induced changes, while preserving discrimination in both wild-type and mutant contexts. Importantly, structural descriptors were deliberately excluded, as naive integration of structural features has been shown to impair model generalization, underscoring the primacy of rigorous dataset curation, biologically informed task formulation, and balanced evaluation metrics. RealKcat establishes a scalable and mutation-sensitive framework for enzyme kinetics prediction, offering a biologically grounded platform for enzyme engineering, metabolic modeling, and therapeutic design.