Learning from All Views: A Multiview Contrastive Framework for Metabolite Annotation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Metabolomics, enabled by high-throughput mass spectrometry, promises to advance our understanding of cellular biochemistry and guide new discoveries in disease mechanisms, drug development, and personalized medicine. However, as the assignment of molecular structures to measured spectra is challenging, annotation rates remain low and hinder potential advancements. We present MultiView Projection (MVP), a novel framework for learning a joint embedding space between molecules and spectra by leveraging multiple data views: molecular graphs, molecular fingerprints, spectra, and consensus spectra. MVP builds on contrastive multiview learning to capture mutual information across views, leading to more robust and generalizable representations for spectral annotation. Unlike prior approaches that consider multiple views via concatenation or as targets of auxiliary tasks, MVP learns from all views jointly, resulting in improved molecular candidate ranking. Notably, MVP supports annotation using either individual spectra or consensus spectra, enabling flexible use of multiple measurements. On the MassSpecGym benchmark, we show that annotation using query consensus spectra significantly outperforms rank aggregation strategies based on constituent spectrum annotation. Using the consensus spectrum view, MVP achieves 35.99% and 13.96% rank@1 when retrieving candidates by mass and formula, respectively. When ranking using individual spectra, MVP demonstrates performance that is superior to or on par with existing methods, achieving 26.37% and 11.10% rank@1 for candidates by mass and formula, respectively. MVP offers a flexible, extensible foundation for learning from multiple molecule/spectra data views.