Machine Learning Insights for Cardiovascular Risk Prediction in Diabetic Patients: Emphasis on Renal and Cardiac Markers Using Random Forests

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objectives Cardiovascular disease remains a leading cause of morbidity and mortality among individuals with diabetes. Although machine learning approaches are increasingly proposed for cardiovascular risk prediction, many published studies report optimistic performance due to inadequate validation and limited reproducibility. This study evaluates whether standard, interpretable machine learning models can predict heart failure mortality when assessed using a rigorously validated and fully reproducible analytic pipeline. Methods Two publicly available datasets from the UCI Machine Learning Repository were analyzed: the Early Stage Diabetes Risk Prediction Dataset (n = 520) and the Heart Failure Clinical Records Dataset (n = 299). The heart failure dataset was used exclusively for model development and outcome evaluation. Logistic regression and random forest classifiers were trained and evaluated using stratified five fold cross validation. All reported performance metrics were computed from pooled out of fold predictions. Preprocessing and any class imbalance handling were performed within training folds only to prevent information leakage. Model discrimination was assessed using the area under the receiver operating characteristic curve, with sensitivity and specificity reported to characterize classification tradeoffs. Results Under pooled out of fold evaluation, the random forest model demonstrated higher discriminative performance than logistic regression (AUC 0.91 versus 0.86). Random forest exhibited higher specificity, whereas logistic regression showed higher sensitivity, reflecting distinct error profiles across models. Feature importance analyses and SHAP based explanations consistently identified serum creatinine, ejection fraction, age, and follow up time as dominant predictors of heart failure mortality.

Article activity feed