An Explainable Self-Supervised Learning Framework for Interpretable and Accurate Heart Disease Prediction Using EDA–SimCLR–SHAP Pipeline

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Precise and interpretable risk prediction for heart disease is a core challenge of contemporary cardiovascular medicine, with early intervention lowering mortality and treatment cost significantly. Conventional supervised learning models lack generalizability on small heterogeneous clinical data sets and cannot offer understandable explanations for their predictions. To face these challenges, this research suggests an end-to-end EDA–SimCLR–SHAP pipeline based on Exploratory Data Analysis (EDA), Self-Supervised Learning (SSL) utilizing SimCLR, and Explainable AI (XAI) using SHAP for accurate and interpretable heart disease prediction. The model begins with carrying out extensive statistical and correlation analysis to support clinically relevant features where ST slope, chest pain type, and exercise-induced angina emerge as robust predictors. It is pretrained in an SSL manner using a SimCLR-based model on unlabeled samples to acquire good latent representations, and then fine-tuned from a supervised classification head for making final predictions. The model is tested on the Cleveland–Hungary Heart Disease Dataset (1,190 samples, 12 clinical features), with 90.76% test accuracy and ROC–AUC of 0.936, beating baseline conventional deep learning and gradient boosting. SHAP-based feature attribution also increased model interpretability to expose feature influences in line with prior cardiological expertise. Integrating representation learning, statistical validation, and explainability into a single method, this research bridges the chasm between model performance and clinical interpretability and provides a reliable, data-efficient, and explainable AI paradigm for early cardiovascular risk discovery and decision-making assistance in real-world clinical settings.

Article activity feed