Fairness-Aware Machine Learning for Heart Failure Prediction: Performance, Bias, and Clinical Deployment Insights

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Heart failure (HF) prediction models using machine learning (ML) must achieve a balance between performance, fairness, and real-world clinical utility. This paper assesses the potential of ML and DL models in the context of heterogeneous databases (UCI, MIMIC) and aims to derive applicable schemes for equitable deployment in healthcare. Although the Transformer models depicted notable AUC-ROC in the UCI data (0.986), they suffered a considerable performance degradation in the MIMIC data, suggesting their generalizability issue. Interestingly, we found contrasting gender preferences: SVM better detected females (AUC-ROC = 0.869 vs 0.796 for males), while XGBoost favored males. Decision curve analysis (DCA) demonstrated that a greater AUC-ROC was not necessarily preferred. For the 0.5 threshold, the net benefit of SVM (0.050) was higher than that of other models. SHAP analysis confirmed sex, ejection fraction, and NT-proBNP as the main predictors, but demonstrated inconsistent feature relations between datasets. We argue that (1) gender-specific threshold optimization, (2)ensemble methods to counteract bias, and (3)robust external validation are necessary and provide a possible path for clinically deployable, fairness-aware HF prediction tools.

Article activity feed