Multimodal Metagenomic Profiling of Bronchoalveolar Lavage Fluid for Diagnostic Classification of Pulmonary Diseases
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent advances in unbiased metagenomic next-generation sequencing (mNGS) enable simultaneous examination of microbial and host genetic material. In this study, we developed a multimodal machine learning-based diagnostic approach to differentiate lung cancer and pulmonary infections using 402 bronchoalveolar lavage fluid (BALF) mNGS datasets. The training cohort revealed differences in DNA/RNA microbial composition, bacteriophage abundances, and host responses, including gene expression, transposable element levels, immune cell composition, and tumor fraction derived from copy number variation (CNV). The diagnostic model (Model VI) that integrated these differential features demonstrated an AUC of 0.937 (95% CI = 0.91–0.964) in the training cohort and 0.847 (95% CI = 0.776–0.918) in the validation cohort for distinguishing lung cancer from pulmonary infections. The application of a rule-in and rule-out strategy-based composite predictive model significantly enhanced accuracy (ACC) in distinguishing between lung cancer and tuberculosis (ACC = 0.896), fungal infection (ACC = 0.915), and bacterial infection (ACC = 0.907). These findings underscore the potential of cost-effective mNGS-based analysis for early differentiation between lung cancer and pulmonary infections.