Common data models to streamline metabolomics processing and annotation, and implementation in a Python pipeline

Joshua M. Mitchell
Yuanye Chi
Maheshwor Thapa
Zhiqiang Pang
Jianguo Xia
Shuzhao Li

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

To standardize metabolomics data analysis and facilitate future computational developments, it is essential is have a set of well-defined templates for common data structures. Here we describe a collection of data structures involved in metabolomics data processing and illustrate how they are utilized in a full-featured Python-centric pipeline. We demonstrate the performance of the pipeline, and the details in annotation and quality control using large-scale LC-MS metabolomics and lipidomics data and LC-MS/MS data. Multiple previously published datasets are also reanalyzed to showcase its utility in biological data analysis. This pipeline allows users to streamline data processing, quality control, annotation, and standardization in an efficient and transparent manner. This work fills a major gap in the Python ecosystem for computational metabolomics.

Author Summary

All life processes involve the consumption, creation, and interconversion of metabolites. Metabolomics is the comprehensive study of these small molecules, often using mass spectrometry, to provide critical information of health and disease. Automated processing of such metabolomics data is desired, especially for the bioinformatics community with familiar tools and infrastructures. Despite of Python’s popularity in bioinformatics and machine learning, the Python ecosystem in computational metabolomics still misses a complete data pipeline. We have developed an end-to-end computational metabolomics data processing pipeline, based on the raw data preprocessor Asari [1]. Our pipeline takes experimental data in .mzML or .raw format and outputs annotated feature tables for subsequent biological interpretation. We demonstrate the application of this pipeline to multiple metabolomics and lipidomics datasets. Accompanying the pipeline, we have designed a set of reusable data structures, released as the MetDataModel package, which shall promote more consistent terminology and software interoperability in this area.

Version published to 10.1101/2024.02.13.580048v1 on bioRxiv
Feb 14, 2024

MicrobiomePhylo: A New Tool for Metabarcoding Data Downstream Analysis - A Real-World Data Analysis Demonstration

This article has 1 author:
1. Camilla Veronica Tafuro
This article has no evaluationsLatest version Apr 12, 2024
Integrating Genome-Scale Metabolic Models with Patient Plasma Metabolome to Study Endothelial Metabolism In Situ

This article has 6 authors:
1. Fernando Silva-Lance
2. Isabel Montejano-Montelongo
3. Eric Bautista
4. Lars K. Nielsen
5. Pär I. Johansson
6. Igor Marin de Mas
This article has no evaluationsLatest version Mar 4, 2024
ExpOmics: a comprehensive web platform empowering biologists with robust multi-omics data analysis capabilities

This article has 5 authors:
1. Douyue Li
2. Zhuochao Min
3. Jia Guo
4. Yubin Chen
5. Wenliang Zhang
This article has no evaluationsLatest version Apr 24, 2024

Listed in

Abstract

Author Summary

Article activity feed

Related articles

MicrobiomePhylo: A New Tool for Metabarcoding Data Downstream Analysis - A Real-World Data Analysis Demonstration

Integrating Genome-Scale Metabolic Models with Patient Plasma Metabolome to Study Endothelial Metabolism In Situ

ExpOmics: a comprehensive web platform empowering biologists with robust multi-omics data analysis capabilities