An open-source platform for reference data-driven analysis of untargeted metabolomics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Untargeted tandem mass spectrometry (MS/MS)-based metabolomics enables broad characterization of small molecules in complex samples, yet the majority of spectra in a typical experiment remain unannotated, limiting biological interpretation. Reference data-driven (RDD) metabolomics addresses this gap by contextualizing spectra through comparison to curated, metadata-annotated reference datasets, allowing inference of spectrum origins without requiring exact structural identification. Here, we present an open-source RDD metabolomics platform comprising a user-friendly web application and a Python software package that perform RDD analyses directly from molecular networking outputs generated by GNPS. The tools support visualization and statistical analysis of RDD results, including interactive bar plots, heat maps, principal component analysis, and Sankey diagrams. We illustrate the approach using a hierarchical reference dataset of 3,500 food items to derive dietary patterns from stool metabolomics data of omnivore and vegan participants. The analysis reveals clear dietary group separation, demonstrating how RDD metabolomics can extract biologically meaningful patterns from otherwise unannotated spectra. Thus, the RDD metabolomics platform removes technical barriers for the metabolomics community to adopting reference data-driven analysis, with the functionality freely available at https://github.com/bittremieuxlab/gnps-rdd and https://gnps-rdd.streamlit.app/ .