Microbiomes without boundaries: cystic fibrosis ‘pulmotype’ classifications are dependent on algorithm choice and database size, and indicate continuous variation

Conan Y. Zhao
Ryan Lowhorn
Haojun Song
Jinyoung Eum
Sam P. Brown

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

A common response to microbiome sample variation is to use clustering algorithms to reduce complex and variable datasets to a smaller number of ‘types’ (e.g. enterotypes for gut samples, or pulmotypes for lung samples). In light of recent analyses showing distinct clustering solutions to in principle similar datasets, we examine the extent to which clustering solutions are dependent on researcher choices of algorithm and dataset, using cystic fibrosis (CF) sputum microbiome data as a model system. Following a structured literature review, we identified 36 CF microbiome studies with publicly available samples and metadata. From these studies we curated a dataset of 4026 sputum microbiome samples across 1184 people with CF (pwCF), complete with matched individual metadata, using a standardized bio-informatic platform. Applying multiple clustering algorithms (DMM, k-means, PAM) to cross-sectional data we find that the optimal clustering varies with both algorithm choice and database size, with generally weak separation among clusters in any classification. Our longitudinal data analyses highlight substantial persistence of cluster types in time, with transitions most common among clusters that are structurally similar, reflecting an underlying continuous landscape of microbiome variation. While transitions among similar clusters are common (e.g. along gradients of Pseudomonas aeruginosa relative abundance), transitions are generally bi-directional, with no clear pathogen-dominated ‘end point’ states. Using samples from 482 pwCF with available lung function data, we find that taxon-based models outperform cluster-based statistical models in predicting clinical lung function data. Together our results highlight that clustering methods can impose arbitrary boundaries on an underlying continuum of microbiome variation.

Importance

Classifying microbiome samples into discrete “types” is a widely used strategy for simplifying complex microbial community data and linking community structure to clinical outcomes. Here we evaluate the utility of cluster-based microbiome typing schemes, using cystic fibrosis (CF) sputum samples as a model system. We conduct a comprehensive re-analysis of over 4000 sputum samples from more than 1000 people with CF. We show that pulmotype classification outcomes are highly sensitive to the choice of clustering algorithm and dataset size, and that clustering can impose artificial boundaries on a continuous landscape of microbial variation. Our findings urge caution over the use of discrete microbiome classifications and emphasize the value of taxon-based models in capturing the ecology and clinical relevance of complex microbial communities.

Version published to 10.1101/2025.05.07.25326893 on medRxiv
May 8, 2025

Altered airway microbiota and microbial biomarkers across respiratory diseases: insights from 16S rDNA sequencing

This article has 9 authors:
1. Huifang Song
2. Maoye Xu
3. Jia Wang
4. Lemei Mo
5. Chunmeng Shan
6. Xuejie Bai
7. Na Ta
8. Cuicui Liu
9. Dejun Sun
This article has no evaluationsLatest version Dec 18, 2025
WITHDRAWN: Gut Dysbiosis and Intestinal Inflammation in Indian Children with Severe Acute Malnutrition: A Case-Control Study

This article has 2 authors:
1. Amrinder Kaur
2. Rakesh Pal
This article has no evaluationsLatest version Jan 26, 2026
PIK3CA-Mutated Colorectal Cancer Exhibits a Unique Gut Dysbiosis Profile: Insights from a Nationwide Pan-Cancer Screen in Japan

This article has 21 authors:
1. Shogen Boku
2. Shunsuke Sakai
3. Kentaro Sawada
4. Satoshi Horasawa
5. Ayumu Yoshikawa
6. Yoshiaki Nakamura
7. Takao Fujisawa
8. Riu Yamashita
9. Hironaga Satake
10. Yoshito Komatsu
11. Tomohiro Nishina
12. Manabu Shiozawa
13. Takatsugu Ogata
14. Nobuhisa Matsuhashi
15. Kentaro Yamazaki
16. Toshifumi Yamaguchi
17. Hisateru Yasui
18. Naoki Takahashi
19. Shigenori Kadowaki
20. Tadamichi Denda
21. Takayuki Yoshino
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Importance

Article activity feed

Related articles

Altered airway microbiota and microbial biomarkers across respiratory diseases: insights from 16S rDNA sequencing

WITHDRAWN: Gut Dysbiosis and Intestinal Inflammation in Indian Children with Severe Acute Malnutrition: A Case-Control Study

PIK3CA-Mutated Colorectal Cancer Exhibits a Unique Gut Dysbiosis Profile: Insights from a Nationwide Pan-Cancer Screen in Japan