Towards automatic derivation of geometry-based descriptors as surrogates for complex computational approaches in enzyme-substrate prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate prediction of enzyme-substrate interactions remains a fundamental challenge in biocatalysis and drug discovery. While machine learning approaches have shown promise, they require extensive training data and often lack mechanistic interpretability. Here, we present a novel methodology that automatically derives geometry-based descriptors from enzyme-substrate complex structures to predict substrate specificity. Our approach simplifies complex catalytic mechanisms into interpretable geometric filters comprising critical inter-atomic distances and accessibility of atomic pairs parameters. We validated this methodology using two mechanistically distinct enzyme families with minimal training data: haloalkane dehalogenases (9 enzymes and 53 substrates) and aldehyde reductases (9 enzymes and 36 substrates). The filters demonstrated robust performance across chemically diverse substrates. On testing datasets, the derived filters achieved average accuracy of 77% and sensitivity of 94% for haloalkane dehalogenases and average 57% recall of true substrates for aldehyde reductases, exceeding state-of-the-art machine learning methods for substrate predictions on these datasets. Crucially, the geometric descriptors directly correspond to catalytic requirements, providing mechanistic insights into substrate recognition. This interpretable, mechanism-based approach requires minimal training data and can be readily applied to newly characterized enzymes, offering a powerful tool for enzyme engineering and substrate screening applications.