Development and validation of a clinically applicable diagnostic model for invasive pulmonary aspergillosis in patients with structural lung diseases
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Patients with structural lung disease are prone to develop lower respiratory tract infections, especially those caused by Aspergillus, due to irreversible damage to the lung parenchyma and interstitium. Early diagnosis of invasive Aspergillus infection is difficult, and delayed treatment is associated with a high risk of mortality. Therefore, in this study, we established a diagnostic prediction model for invasive Aspergillus infection in patients with structural lung disease with the aim of providing a foundation for early detection. Methods We conducted a retrospective cohort study analyzing inpatients with structural lung diseases admitted to Beijing Chest Hospital between January 1, 2020, and December 31, 2024. Data were randomly divided into training (70%) and validation sets (30%) using stratified random sampling to maintain proportional representation of key demographics.For variable selection, we performed univariate analysis to identify potential predictors associated with invasive pulmonary aspergillosis (IPA) in patients with structural lung diseases. Variables achieving significance at P < 0.1 were retained for further analysis. Subsequently, we applied Lasso regression with 10-fold cross-validation to determine feature importance weights. Based on the combined criteria of variable significance (P < 0.05) and odds ratio magnitude, the top five candidate predictors were selected for inclusion in a stepwise multivariate logistic regression model.The final prediction model was visualized through a nomogram incorporating selected risk factors. Model performance was comprehensively evaluated using:Discrimination: Receiver operating characteristic (ROC) curve analysis with area under the curve (AUC);Calibration: Hosmer-Lemeshow goodness-of-fit test;Clinical Utility: Decision curve analysis (DCA) and clinical impact curve (CIC);Diagnostic Metrics: Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).To enhance generalizability, six machine learning algorithms including Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), and XGBoost were employed for comparative validation. Ensemble learning techniques were implemented to optimize model performance across different algorithms. Results A total of 204 eligible inpatients were included (84 with IAI and 120 without IAI). After variable selection via LASSO regression, multiple logistic regression was performed, and the following four independent risk factors were ultimately identified: coexisting diabetes, radiological cavitary manifestations, blood Aspergillus IgG antibody, and BALF-mNGS. The AUC of the diagnostic model was 0.88 (95% CI 0.82–0.94), and a visual nomogram was created. At the optimal cutoff value (0.431), the sensitivity and specificity of the validation set reached 0.81 (95% CI: 0.68–0.93) and 0.92 (95% CI: 0.81–1.00), respectively, with a positive predictive value (PPV) as high as 0.94 (95% CI: 0.85–1.00), demonstrating good diagnostic performance. The model was validated by six machine learning classifiers and showed stable performance: XGBoost AUC 0.977 (95% CI 0.960–0.994), GNB AUC 0.890 (95% CI 0.841–0.939), decision tree AUC 0.987 (95% CI 0.976–0.998), SVM AUC 0.884 (95% CI 0.828–0.939), KNN AUC 0.909 (95% CI 0.860–0.946), and random forest AUC 0.979 (95% CI 0.963–0.996). Conclusions A multimodal diagnostic prediction model that integrates clinical, imaging and microbiological data, after being verified by machine learning classification methods, can effectively identify invasive aspergillosis in patients with structural lung diseases.