Decoding the Multi-Dimensional Complexity of Glycosylation Reaction via Machine Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Precise stereocontrol in glycosidic bond formation remains a central challenge in carbohydrate chemistry, governed by subtle, interdependent chemical and environmental factors that limit access to complex oligosaccharides and glycoconjugates. We developed a data-efficient machine learning framework to model, optimize, and control glycosylation stereoselectivity and efficiency. Parameterizing over 800 validated batch glycosylation reactions, our hybrid model integrates quantified chemical descriptors with a novel Environmental Factor Impact index (EFI), capturing structure-reactivity-environment interdependencies. EFI quantifies environmental influences, simplifying multidimensional data. The resulting system achieves state-of-the-art predictive accuracy for stereoselectivity (R2= 0.98) and yield (R2= 0.97), RMSE 2% for both, and generalizes to new chemical space. Crucially, it supports bidirectional inference: forward prediction of outcomes from conditions, and inverse design of conditions for targeted selectivity. This framework delivers algorithm-guided optimization, accurate extrapolation to untested glycosylating agent-alcohol pairs, transforming glycosylation from empirical trial-and-error into a predictive, data-driven process for carbohydrate synthesis and glycoscience.