A Comparative Study of Machine Learning Model for Early Detection of Heart Failure in Bangladesh: Reducing Disease Burden and Improving Healthcare Services through Data-Driven Knowledge

Mahmood Hasan Khan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Heart failure (HF) is a leading cause of morbidity and mortality globally. Lower-middle-income countries like Bangladesh are facing difficult challenges in dealing with heart failure due to the unavailability of proper predictive tools, which eventually leads to delayed diagnosis and thus increases the disease burden countrywide. Traditional risk models often fail to capture the intricate interactions between clinical variables. This study investigates the utility of the machine learning (ML) models for predicting heart failure using clinical, demographic, and biomarker data from Bangladeshi subjects. Methods: A cross-sectional observational study was conducted at one of the major hospitals in Dhaka. From face-to-face patient interviews and medical records, a total of 44 features, which include demographic, clinical, laboratory, and imaging data, were extracted. After thorough preprocessing and feature selection using SHAP (Shapely Additive Explanations) values and clinical insight, 15 machine learning algorithms were trained and evaluated using stratified 10-fold cross-validation. Model optimization was performed using Optuna—a powerful hyperparameter framework. Models' performances were evaluated using accuracy, precision, recall, F1 score, and AUC (area under the curve). Results: Among all the models, random forest was found to be the model with the highest mean accuracy of 88.75% and AUC of 92.57% using 10-fold cross-validation after tuning with Optuna. Through SHAP analysis, fractional shortening (FS), left ventricular ejection fraction (LVEF), NYHA class, and BNP (brain natriuretic peptide) were found to be the most significant features. Ensemble methods like bagging and boosting showed slight stability but did not perform any better than the best individual models. The learning and precision-recall curves additionally confirmed model reliability and generalizability. Conclusion: Machine learning models, especially Random Forest and LightGBM, demonstrated high efficiency in the health care settings of Bangladesh. These models offer a promising interpretable tool for early detection of heart failure and thus potentially aid timely intervention in resource-limited settings.

Version published to 10.21203/rs.3.rs-7532602/v1 on Research Square
Oct 22, 2025

Improving Type 2 Diabetes Prediction: Comparative Evaluation of Machine Learning Classifiers Using Balanced Data from the AWI-Gen Cohort

This article has 1 author:
1. Richmond Balinia Adda
This article has no evaluationsLatest version Nov 4, 2025
Clinical prediction model for the risk of bleeding during hospitalization in patients with acute myocardial infarction: a retrospective cohort study from the MIMIC-IV database

This article has 5 authors:
1. ZIjie Bai
2. Tongxian Hou
3. Pengyu Lu
4. Huiqin Li
5. Jieyun Liu
This article has no evaluationsLatest version Nov 11, 2025
Explainable Machine Learning Models for Predicting Health-Related Quality of Life in High-Risk Cardiovascular Populations: A Comparative Analysis of SF-12 Data and Clinical Risk Stratification

This article has 7 authors:
1. Guoliang Ma
2. Xin Hong
3. Lin Zhu
4. Wenting Li
5. Zhuanzhuan Fan
6. Kun Li
7. Wenyan Wang
This article has no evaluationsLatest version Sep 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Improving Type 2 Diabetes Prediction: Comparative Evaluation of Machine Learning Classifiers Using Balanced Data from the AWI-Gen Cohort

Clinical prediction model for the risk of bleeding during hospitalization in patients with acute myocardial infarction: a retrospective cohort study from the MIMIC-IV database

Explainable Machine Learning Models for Predicting Health-Related Quality of Life in High-Risk Cardiovascular Populations: A Comparative Analysis of SF-12 Data and Clinical Risk Stratification