An open-source platform for reference data-driven analysis of untargeted metabolomics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Untargeted tandem mass spectrometry (MS/MS)-based metabolomics enables broad characterization of small molecules in complex samples, yet the majority of spectra in a typical experiment remain unannotated, limiting biological interpretation. Reference data-driven (RDD) metabolomics addresses this gap by contextualizing spectra through comparison to curated, metadata-annotated reference datasets, allowing inference of spectrum origins without requiring exact structural identification. Here, we present an open-source RDD metabolomics platform comprising a user-friendly web application and a Python software package that perform RDD analyses directly from molecular networking outputs generated by GNPS. The tools support visualization and statistical analysis of RDD results, including interactive bar plots, heat maps, principal component analysis, and Sankey diagrams. We illustrate the approach using a hierarchical reference dataset of 3,500 food items to derive dietary patterns from stool metabolomics data of omnivore and vegan participants. The analysis reveals clear dietary group separation, demonstrating how RDD metabolomics can extract biologically meaningful patterns from otherwise unannotated spectra. Thus, the RDD metabolomics platform removes technical barriers for the metabolomics community to adopting reference data-driven analysis, with the functionality freely available at https://github.com/bittremieuxlab/gnps-rdd and https://gnps-rdd.streamlit.app/ .

Article activity feed