A Comparative Study of Machine Learning Model for Early Detection of Heart Failure in Bangladesh: Reducing Disease Burden and Improving Healthcare Services through Data-Driven Knowledge
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Heart failure (HF) is a leading cause of morbidity and mortality globally. Lower-middle-income countries like Bangladesh are facing difficult challenges in dealing with heart failure due to the unavailability of proper predictive tools, which eventually leads to delayed diagnosis and thus increases the disease burden countrywide. Traditional risk models often fail to capture the intricate interactions between clinical variables. This study investigates the utility of the machine learning (ML) models for predicting heart failure using clinical, demographic, and biomarker data from Bangladeshi subjects. Methods: A cross-sectional observational study was conducted at one of the major hospitals in Dhaka. From face-to-face patient interviews and medical records, a total of 44 features, which include demographic, clinical, laboratory, and imaging data, were extracted. After thorough preprocessing and feature selection using SHAP (Shapely Additive Explanations) values and clinical insight, 15 machine learning algorithms were trained and evaluated using stratified 10-fold cross-validation. Model optimization was performed using Optuna—a powerful hyperparameter framework. Models' performances were evaluated using accuracy, precision, recall, F1 score, and AUC (area under the curve). Results: Among all the models, random forest was found to be the model with the highest mean accuracy of 88.75% and AUC of 92.57% using 10-fold cross-validation after tuning with Optuna. Through SHAP analysis, fractional shortening (FS), left ventricular ejection fraction (LVEF), NYHA class, and BNP (brain natriuretic peptide) were found to be the most significant features. Ensemble methods like bagging and boosting showed slight stability but did not perform any better than the best individual models. The learning and precision-recall curves additionally confirmed model reliability and generalizability. Conclusion: Machine learning models, especially Random Forest and LightGBM, demonstrated high efficiency in the health care settings of Bangladesh. These models offer a promising interpretable tool for early detection of heart failure and thus potentially aid timely intervention in resource-limited settings.