Prediction of enzyme functions and rational design of enzyme variants using GEnESIS: Graph-based Enzyme Evolution with Structure-Informed Scoring
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Engineered enzymes can be used to sustainably produce valuable chemicals. However, current machine learning-based enzyme prediction models depend on database annotations and primarily use sequence-level information, without incorporating large protein-ligand three-dimensional (3D) structure datasets. Therefore, we created a graph neural network (GNN), to predict enzyme substrate affinity from protein 3D structures. When trained with 40,718 cytochrome P450 (P450) structures, a strong correlation (R 2 = 0.87) existed between docking-based affinity and predicted affinity for the model aromatic substrate tyrosine; this result indicated that the model performed well as a feature extractor and captured non-linear relationships between the substrate and protein structures. Unsupervised clustering based on docking poses suggested that 854 identified P450s show high potential to convert tyrosine to L-3,4-dihydroxyphenylalanine. Optimal P450s were ranked using substrate graph-based clustering and predicted affinity. The active sites of the best candidates showed tyrosine in close proximity to the active center iron. Eigenvector centrality of the graph representation of the selected enzyme 3D structure was then used to rapidly design highly reactive enzyme variants. The GNN-based affinity prediction model could then be used to score high-potential variants. Selection of amino acid residues based on eigenvector centrality allowed effective sampling of high affinity variants compared to randomly selected variants (p = 0.0012). Improvement in reactivity of variants optimized through graph eigenvector centrality was supported by MD simulations. This structure-based GNN approach will be used to accelerate the directed evolution novel P450 catalyzed reactions.