Fairness-Aware Machine Learning for Heart Failure Prediction: Performance, Bias, and Clinical Deployment Insights

Elvin Ugonna Eziama
Sandra Ijeoma Eziama
Victor Ifechukwude Agboli
Tope Amusa
Edokpolor Harrison
Mariam Iyabo Adeoba
Idowu A. Usman
Victor Eghujovbo
Ibrahim Olasege
Oladapo Ogunnaike
Timothy O. Ige
Deborah Okunola
Folayan Tobiloba Taiwo
Joy Nma Anyanacho
Rukayat Olasege
Adekunle Adeoye

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Heart failure (HF) prediction models using machine learning (ML) must achieve a balance between performance, fairness, and real-world clinical utility. This paper assesses the potential of ML and DL models in the context of heterogeneous databases (UCI, MIMIC) and aims to derive applicable schemes for equitable deployment in healthcare. Although the Transformer models depicted notable AUC-ROC in the UCI data (0.986), they suffered a considerable performance degradation in the MIMIC data, suggesting their generalizability issue. Interestingly, we found contrasting gender preferences: SVM better detected females (AUC-ROC = 0.869 vs 0.796 for males), while XGBoost favored males. Decision curve analysis (DCA) demonstrated that a greater AUC-ROC was not necessarily preferred. For the 0.5 threshold, the net benefit of SVM (0.050) was higher than that of other models. SHAP analysis confirmed sex, ejection fraction, and NT-proBNP as the main predictors, but demonstrated inconsistent feature relations between datasets. We argue that (1) gender-specific threshold optimization, (2)ensemble methods to counteract bias, and (3)robust external validation are necessary and provide a possible path for clinically deployable, fairness-aware HF prediction tools.

Version published to 10.1101/2025.10.17.25338263 on medRxiv
Oct 19, 2025

Algorithm Fairness in Predicting Unmet Preventive Care: Evidence from 16 European Countries using SHARE

This article has 10 authors:
1. Toby Kai-Bo Shen
2. Vincent Cheng Sheng Li
3. Nick Meng-Huan Chen
4. Jennifer Sheng Hui Hsu
5. Rifat Atun
6. Valerie Tzu Ning Liu
7. Charlotte Wang
8. David Bin-Chia Wu
9. Pin-Chun Yeh
10. John Tayu Lee
This article has no evaluationsLatest version Sep 10, 2025
Comprehensive, Transparent, and Fair Machine Learning Models for Hypertension Risk Prediction: Benchmarking With Framingham, External Validation, Individual-Level Analysis, and Equitable Clinical Utility

This article has 2 authors:
1. Parsa Amirian
2. Mahsa Zarpoosh
This article has no evaluationsLatest version Sep 5, 2025
Model uncertainty quantification: A post hoc calibration approach for heart disease prediction

This article has 3 authors:
1. Peter Adebayo Odesola
2. Adewale Alex Adegoke
3. Idris Babalola
This article has no evaluationsLatest version Sep 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Algorithm Fairness in Predicting Unmet Preventive Care: Evidence from 16 European Countries using SHARE

Comprehensive, Transparent, and Fair Machine Learning Models for Hypertension Risk Prediction: Benchmarking With Framingham, External Validation, Individual-Level Analysis, and Equitable Clinical Utility

Model uncertainty quantification: A post hoc calibration approach for heart disease prediction