Spec2Class: Accurate Prediction of Plant Secondary Metabolite Class using Deep Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mass spectrometry (MS)-based data is commonly used in studying metabolism and natural products, but typically requires domain-specific skill and experience to analyze. Existing computational tools for non-targeted metabolite analysis (i.e., metabolomics) mostly rely on comparison to reference MS spectral libraries for metabolite identification, limiting the annotation of metabolites for which reference spectra do not exist. This is the case in plant secondary metabolites, where most spectral features remain unidentified. Here, we developed Spec2Class , a deep-learning algorithm for the identification and classification of plant secondary metabolites from liquid chromatography (LC)-MS/MS spectra. We used the in-house spectral library of 7973 plant metabolite chemical standards, alongside publicly available data, to train Spec2Class to classify LC-MS/MS spectra to 43 common plant secondary metabolite classes. Tested on held out sets, our algorithm achieved an overall accuracy of 73%, outperforming state-of-the-art classification. We further established a prediction certainty parameter to set a threshold for low-confidence results. Applying this threshold, we reached an accuracy of 93% on an unseen dataset. We show a high robustness of our prediction to noise and to the data acquisition method. Spec2Class is publicly available and is anticipated to facilitate metabolite identification and accelerate natural product discovery.

Significance Statement

Untargeted mass spectrometry (MS) is essential for natural product discovery but is limited by product identification, which is often manual and requires domain-specific skills. Spec2Class addresses this limitation by accurately classifying plant secondary metabolites from LC-MS/MS spectra without reliance on reference spectral libraries. Trained on a substantial dataset and using a prediction certainty threshold, it outperforms state-of-the-art algorithms with 93% accuracy. This tool demonstrates high robustness against noise and different data acquisition methods, promising to streamline metabolite identification and expedite natural product research. Spec2Class is open-source, publicly available, and easy to integrate into natural product discovery pipelines.

Article activity feed