Prediction of enzyme functions and rational design of enzyme variants using GEnESIS: Graph-based Enzyme Evolution with Structure-Informed Scoring

Satoshi Yuzawa
Christopher J. Vavricka

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Engineered enzymes can be used to sustainably produce valuable chemicals. However, current machine learning-based enzyme prediction models depend on database annotations and primarily use sequence-level information, without incorporating large protein-ligand three-dimensional (3D) structure datasets. Therefore, we created a graph neural network (GNN), to predict enzyme substrate affinity from protein 3D structures. When trained with 40,718 cytochrome P450 (P450) structures, a strong correlation (R ² = 0.87) existed between docking-based affinity and predicted affinity for the model aromatic substrate tyrosine; this result indicated that the model performed well as a feature extractor and captured non-linear relationships between the substrate and protein structures. Unsupervised clustering based on docking poses suggested that 854 identified P450s show high potential to convert tyrosine to L-3,4-dihydroxyphenylalanine. Optimal P450s were ranked using substrate graph-based clustering and predicted affinity. The active sites of the best candidates showed tyrosine in close proximity to the active center iron. Eigenvector centrality of the graph representation of the selected enzyme 3D structure was then used to rapidly design highly reactive enzyme variants. The GNN-based affinity prediction model could then be used to score high-potential variants. Selection of amino acid residues based on eigenvector centrality allowed effective sampling of high affinity variants compared to randomly selected variants (p = 0.0012). Improvement in reactivity of variants optimized through graph eigenvector centrality was supported by MD simulations. This structure-based GNN approach will be used to accelerate the directed evolution novel P450 catalyzed reactions.

Version published to 10.21203/rs.3.rs-6971884/v1 on Research Square
Aug 11, 2025

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

This article has 4 authors:
1. Tayyip Topuz
2. Zeki Erdem
3. Halil Bisgin
4. E. Demet Akten
This article has no evaluationsLatest version Feb 2, 2026
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
Predictive Bioactivity Modeling and Structural Binding Analysis for the Identification of Potential SMYD3 Modulators

This article has 4 authors:
1. Abdullah R. Alzahrani
2. Zia Ur Rehman
3. Talha Jawaid
4. Abida Khan
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

Predictive Bioactivity Modeling and Structural Binding Analysis for the Identification of Potential SMYD3 Modulators