Large-scale discovery and annotation of hidden substructure patterns in mass spectrometry profiles
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Untargeted mass spectrometry measures many spectra of unknown molecules. To annotate them, tandem mass spectrometry (MS/MS) generates fragmentation patterns representing common substructures. MS2LDA discovers these patterns via unsupervised topic modelling as Mass2Motifs. However, MS/MS-based substructure identification is limited by computational efficiency and interpretability. Here, we report an up to 14x speed improvement through improved algorithmic efficiency. Furthermore, the new automated Mass2Motif Annotation Guidance (MAG) aids in structurally identifying Mass2Motifs. Using three chemically diverse curated MotifDB-MotifSets for benchmarking, MAG achieved median substructure overlap scores of 0.75, 0.93, and 0.95, demonstrating robust substructural annotations. We further validated MS2LDA 2.0 in experimental data by identifying substructures of pesticides spiked into a biological matrix and demonstrated its discovery potential by annotating previously uncharacterized fungal natural products. Together with the new visualization app and MassQL-searchable MotifDB, we anticipate that MS2LDA 2.0 will boost the identification of novel chemistry and hidden patterns in mass spectrometry profiles.