An Explainable Self-Supervised Learning Framework for Interpretable and Accurate Heart Disease Prediction Using EDA–SimCLR–SHAP Pipeline

Shajedul Hasan Arman
Omar Faruque Siyam
Md.Faishal Ahmed Rudro
Afiah Rahman

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Precise and interpretable risk prediction for heart disease is a core challenge of contemporary cardiovascular medicine, with early intervention lowering mortality and treatment cost significantly. Conventional supervised learning models lack generalizability on small heterogeneous clinical data sets and cannot offer understandable explanations for their predictions. To face these challenges, this research suggests an end-to-end EDA–SimCLR–SHAP pipeline based on Exploratory Data Analysis (EDA), Self-Supervised Learning (SSL) utilizing SimCLR, and Explainable AI (XAI) using SHAP for accurate and interpretable heart disease prediction. The model begins with carrying out extensive statistical and correlation analysis to support clinically relevant features where ST slope, chest pain type, and exercise-induced angina emerge as robust predictors. It is pretrained in an SSL manner using a SimCLR-based model on unlabeled samples to acquire good latent representations, and then fine-tuned from a supervised classification head for making final predictions. The model is tested on the Cleveland–Hungary Heart Disease Dataset (1,190 samples, 12 clinical features), with 90.76% test accuracy and ROC–AUC of 0.936, beating baseline conventional deep learning and gradient boosting. SHAP-based feature attribution also increased model interpretability to expose feature influences in line with prior cardiological expertise. Integrating representation learning, statistical validation, and explainability into a single method, this research bridges the chasm between model performance and clinical interpretability and provides a reliable, data-efficient, and explainable AI paradigm for early cardiovascular risk discovery and decision-making assistance in real-world clinical settings.

Version published to 10.21203/rs.3.rs-7890495/v1 on Research Square
Oct 22, 2025

Unified approach for Accurate Heart Disease Prediction using Machine Learning Techniques

This article has 4 authors:
1. Raghavendra Rao RV
2. Ram Mohan Reddy Ch
3. Hemanth K
4. Hruthik Chavan D
This article has no evaluationsLatest version Oct 28, 2025
Explainable Machine Learning Models for Predicting Health-Related Quality of Life in High-Risk Cardiovascular Populations: A Comparative Analysis of SF-12 Data and Clinical Risk Stratification

This article has 7 authors:
1. Guoliang Ma
2. Xin Hong
3. Lin Zhu
4. Wenting Li
5. Zhuanzhuan Fan
6. Kun Li
7. Wenyan Wang
This article has no evaluationsLatest version Sep 30, 2025
Supervised Learning for Predicting Unknown Modifying Variables in Pliable Lasso: Applications to High-Dimensional Datasets

This article has 3 authors:
1. Zainab Subhi Mahmood Hawrami
2. Mehmet Ali Cengiz
3. Emre Dünder
This article has no evaluationsLatest version Sep 12, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unified approach for Accurate Heart Disease Prediction using Machine Learning Techniques

Explainable Machine Learning Models for Predicting Health-Related Quality of Life in High-Risk Cardiovascular Populations: A Comparative Analysis of SF-12 Data and Clinical Risk Stratification

Supervised Learning for Predicting Unknown Modifying Variables in Pliable Lasso: Applications to High-Dimensional Datasets