Microbiomes without boundaries: cystic fibrosis ‘pulmotype’ classifications are dependent on algorithm choice and database size, and indicate continuous variation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A common response to microbiome sample variation is to use clustering algorithms to reduce complex and variable datasets to a smaller number of ‘types’ (e.g. enterotypes for gut samples, or pulmotypes for lung samples). In light of recent analyses showing distinct clustering solutions to in principle similar datasets, we examine the extent to which clustering solutions are dependent on researcher choices of algorithm and dataset, using cystic fibrosis (CF) sputum microbiome data as a model system. Following a structured literature review, we identified 36 CF microbiome studies with publicly available samples and metadata. From these studies we curated a dataset of 4026 sputum microbiome samples across 1184 people with CF (pwCF), complete with matched individual metadata, using a standardized bio-informatic platform. Applying multiple clustering algorithms (DMM, k-means, PAM) to cross-sectional data we find that the optimal partitioning depends on both choice of algorithm and database size, with generally weak separation among clusters in any classification. Our longitudinal data analyses highlight substantial persistence of cluster types in time, with transitions most common among clusters that are structurally similar, reflecting an underlying continuous landscape of microbiome variation. While transitions among similar clusters are common (e.g. along gradients of Pseudomonas aeruginosa relative abundance), transitions are generally bi-directional, with no clear pathogen-dominated ‘end point’ states. Together our results highlight that clustering tools can generate arbitrary and inconsistent boundaries on a continuously varying landscape of microbiome variation.

Article activity feed