ClusterApp to visualize, organize, and navigate metabolomics data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Clustering analysis is a foundational step in exploratory data analysis workflows, with dimensionality reduction methods commonly used to visualize multidimensional data in lower-dimensional spaces and infer sample clustering. Principal Component Analysis (PCA) is widely applied in metabolomics but is often suboptimal for clustering visualization. Metabolomics data often require specialized manipulations such as blank removal, quality control adjustments, and data transformations that demand efficient visualization tools. However, the lack of user-friendly tools for clustering without computational expertise presents a challenge for metabolomics researchers. ClusterApp addresses this gap as a web application that performs Principal Coordinate Analysis (PCoA), expanding clustering alternatives in metabolomics. Built on a QIIME 2 Docker image, it enables PCoA computation and Emperor plot visualization. The app supports data input from GNPS, GNPS2, or user-provided spreadsheets. Freely available, ClusterApp can be locally installed as a Docker image or integrated into Jupyter notebooks, offering accessibility and flexibility to diverse users.

Results

To demonstrate the data preprocessing techniques available in ClusterApp, we analyzed two Liquid Chromatography coupled to Tandem Mass Spectrometry (LC-MS/MS) metabolomics datasets: one exploring metabolomic differences in mouse tissue samples and another investigating coral life history stages. Among the dissimilarity measures available, the Bray-Curtis measure effectively highlighted key metabolomic variations and patterns across both datasets. Targeted filtering significantly enhanced data reliability by retaining biologically relevant features, 10,617 in the coral dataset and 7,341 in the mouse dataset while eliminating noise. The combination of Total Ion Current (TIC) normalization and auto-scaling improved clustering resolution, revealing distinct separations in tissue types and life stages. ClusterApp’s flexible features, such as customizable blank removal and group selection, provided tailored analyses, enhancing visualization and interpretation of metabolomic profiles.

Conclusion

ClusterApp addresses the need for accessible, dynamic tools for exploratory data analysis in metabolomics. By coupling data transformation capabilities with PCoA on multiple dissimilarity matrices, it provides a versatile solution for clustering analysis. Its web interface and Docker-based deployment offer flexibility, accommodating a wide range of use cases through graphical or programmatic interactions. ClusterApp empowers researchers to uncover meaningful patterns and relationships in metabolomics data without requiring cumbersome data manipulation or advanced bioinformatics expertise.

Article activity feed