MetaboFM: A Foundation Model for Spatial Metabolomics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mass spectrometry imaging (MSI) provides molecularly resolved maps of metabolites and lipids across tissues, yet the lack of large-scale, unified representation learning frameworks limits its potential for generalization and downstream analysis. Here, we introduce MetaboFM, a foundation model for spatial metabolomics that consolidates thousands of public MSI datasets into standardized spatial–spectral tensors and extracts transferable embeddings using pretrained Vision Transformers. We curated and standardized around 4000 publicly available MSI datasets from the METASPACE repository, spanning multiple organisms, tissue types, ionization modes, and instruments. Across six metadata prediction tasks—encompassing organism, ionization polarity, tissue type, condition, analyzer type, and ionization source embeddings from pretrained MetaboFM encoders achieved a mean macro–F1 of 0.74 and accuracy of 0.80 with linear probes, demonstrating substantially higher discriminative power than classical principal component analysis (PCA) or randomly initialized baselines by over 20 percentage points. To interpret the learned representations, we mapped embedding directions back to the m/z domain, revealing distinct spectral regions that drive class separation across tissues, conditions, and ionization sources. A multimodal visual question answering (VQA) extension further links MSI embeddings with natural-language queries through a cross-attention fusion module, attaining an average macro–F1 of 0.61 ± 0.05 across tasks. Finally, an interactive Gradio interface enables users to visualize MSI patches and query sample metadata in free-form language. Together, MetaboFM establishes a scalable foundation model paradigm for MSI, unifying representation learning, spectral interpretability, and multimodal interaction within a single framework for spatial metabolomics.

Article activity feed