Deep Learning Prediction of Intact N -/ O -Glycopeptide Tandem Mass Spectra Enhances Glycoproteomics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein glycosylation, a post-translational modification involving the attachment of glycans to proteins, plays critical roles in numerous physiological and pathological cellular functions. Characterization of protein glycosylation is among one of the most challenging problems in proteomics due to the high heterogeneity of glycosites and glycan structures. Recently, deep learning has been adopted to predict N -glycopeptide tandem mass (MS/MS) spectra and exhibited a promising effect in N -glycoproteomics. However, current deep learning frameworks struggle to accurately predict O -glycopeptide MS/MS spectra due to the complexity of O -glycopeptides and the limited availability of training data. In this study, we introduce DeepGPO, a deep learning framework for the prediction of both N - and O -glycopeptide MS/MS spectra. The DeepGPO incorporates a Transformer module alongside two graph neural network (GNN) modules specifically designed for handling branched glycans. To address the issue of data scarcity in O -glycoproteomics datasets, various training methods are adopted in DeepGPO, such as the introduction of training weights for different MS/MS spectra and the adoption of pre-training strategies. DeepGPO exhibits accurate prediction of N - and O -glycopeptides MS/MS spectra. With the predicted MS/MS, glycosylation sites can be localized even in the absence of site-determining ions, for instance using higher-energy collisional dissociation (HCD) for the localization of O -glycosylation. We also explored the possibility of differentiating O -glycosites and N -glycosites using the predicted MS/MS spectra. DeepGPO primarily addresses mono-glycosylated peptides and is capable of handling doubly glycosylated peptides, but it currently cannot process peptides with three or more glycan modifications due to limited training data and increased spectral complexity. We anticipate that DeepGPO will inspire future advancements in glycoproteomics research.