Development and External Validation of an Interpretable Machine-Learning Model for HFpEF Comorbidity Risk in COPD Patients

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

BACKGROUND Chronic Obstructive Pulmonary Disease (COPD) and Heart Failure with preserved Ejection Fraction (HFpEF) frequently coexist, leading to increased hospitalization, mortality, and healthcare burden. Early identification of HFpEF risk in COPD patients is critical for timely intervention. AIM To develop and validate an interpretable machine learning (ML) model for predicting HFpEF risk in COPD patients and to identify key predictors using explainable artificial intelligence techniques. METHODS This retrospective study analyzed 1,550 COPD patients, divided into COPD-only and COPD-HFpEF groups. Feature selection was performed using LASSO regression, logistic regression, and Boruta random forest. Ten ML models were developed and evaluated on an internal test set, with the best model further validated on an external cohort (n = 69). Model interpretability was assessed using SHapley Additive exPlanations (SHAP). RESULTS Nine predictors were consistently selected: NT-proBNP, red blood cell count, fibrinogen, cholesterol, arterial PaO₂, inspiratory capacity (IC), IC% predicted, late diastolic mitral inflow velocity, and the COPD Assessment Test score. The XGBoost model achieved the best performance, with an AUC of 0.898 (95% CI: 0.867–0.929) on the internal test set and 0.851 (95% CI: 0.753–0.948) on external validation. SHAP analysis identified NT-proBNP as the most influential predictor. CONCLUSION The developed XGBoost model accurately predicts HFpEF risk in COPD patients and offers clinically interpretable insights into key risk factors, supporting early identification and stratified management.

Article activity feed