Decoding substrate specificity determining factors in glycosyltransferase-B enzymes – Insights from machine learning models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Substrate specificity is an essential characteristic of any enzyme’s function and an understanding of the factors that determine this specificity is crucial for enzyme engineering. Unlike the structure of an enzyme which is directly impacted by its sequence, substrate specificity as an enzyme attribute involves a rather indirect relationship with sequence as it also depends on structural aspects that dictate substrate accessibility and active site dynamics. In this study, we explore the performance of classifier-based machine learning models trained on curated sequence and structural data for a class of glycosyltransferases (GTs), namely GT-Bs, to understand their substrate specificity determining factors. GTs enable the transfer of sugar moieties to other biomolecules such as oligosaccharides or proteins and are found in all kingdoms of life. In plants, GTs participate in the biosynthesis of plant cell wall biopolymers (eg: hemicelluloses and pectins) and are an integral part of the enzymatic machinery that enables the storage of carbon and energy as plant biomass. To elucidate the substrate specificity of uncharacterized GT-Bs, we constructed multi-label machine learning models (Support Vector Classifier, K-Nearest Neighbors, Gaussian Naïve-Bayes, Random Forest) that incorporate both sequence and structural features. These models achieve good predictive accuracies on test datasets. However, despite our use of structural information, we highlight that there is further scope for improvement in training these models to draw interpretable relationships between sequence, structure and substrate specificity determining motifs in GT-Bs.