MassCube: a Python framework for end-to-end metabolomics data processing from raw files to phenotype classifiers

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Nontargeted peak detection in LC-MS-based metabolomics must become robust and benchmarked. We present MassCube, a Python-based open-source framework for MS data processing that we systematically benchmarked against other algorithms and different types of input data. From raw data, peaks are detected by constructing mass traces through signal clustering and Gaussian-filter assisted edge detection. Peaks are then grouped for adduct and in-source fragment detection, and compounds are annotated by both identity- and fuzzy searches. Final data tables undergo quality controls and can be used for metabolome-informed phenotype prediction. Peak detection in MassCube achieves 100% signal coverage with comprehensive reporting of chromatographic metadata for quality assurance. MassCube outperforms MS-DIAL, MZmine3 or XCMS for speed, isomer detection, and accuracy. It supports diverse numerical routines for MS data analysis while maintaining efficiency, capable for handling 105 GB of Astral MS data on a laptop within 64 minutes, while other programs took 8-24 times longer. MassCube automatically detected age, sex and regional differences when applied to the Metabolome Atlas of the Aging Mouse Brain data despite batch effects. MassCube is available at https://github.com/huaxuyu/masscube for direct use or implementation into larger applications in omics or biomedical research.

Article activity feed