Discovery of Novel Anticancer Agents and Influenza Potential Biomarkers Through a Mass Spectrometry Foundation Model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Metabolomics, using non-targeted tandem mass spectrometry, generates rich biological data, but novel metabolites with structures absent from databases are challenging to analyze. Existing algorithms predict isolated chemical features such as molecular fingerprints or structural classes but fail to integrate them into reliable structure-level predictions, particularly for complex metabolites or spectra with high noise and sparse fragments. Here we present ComFaceID, a foundation model that revolutionizes de novo structure profiling through generating 500-dimensional embeddings from MS² spectra, enabling parallel prediction of diverse structural descriptors. ComFaceID consistently outperforms state-of-the-art tools across key tasks, including library search, classification, and fingerprint prediction, even under challenging conditions such as complex metabolomic backgrounds with noisy and information-poor spectra. By integrating outputs from these multi-task predictions, we further developed a multi-parameter framework which significantly enhances prioritization accuracy over single-parameter approaches. When applied to 3,334 actinomycete extracts, ComFaceID uncovered 6 novel compounds across 3 structural classes, including a unique hexahydroindolizine scaffold. Two compounds showed potent, broad-spectrum cytotoxicity superior to doxorubicin, with lower off-target toxicity. In an H1N1 infection model, the ComFaceID pipeline identified over 40 unannotated likely biomarkers, revealing pulmonary inflammation-induced gut metabolic remodeling marked by increased saturated fatty acyl phospholipids and reduced bile acids. By bridging spectral interpretation and novel metabolite discovery, ComFaceID establishes a new workflow for structure-informed metabolomics.

Article activity feed