Comparative Evaluation of SHAP and LIME for Clinical Interpretability in Postoperative Cardiac Surgery Mortality Prediction Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Aims Despite their strong predictive performance, complex machine learning (ML) models are often criticized for their lack of interpretability, especially in high-stakes clinical settings. This study aims to compare two leading explainable artificial intelligence (XAI) methods—SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME)—when applied to a validated XGBoost model for 30-day mortality prediction after cardiac surgery. We explore their ability to provide transparent, clinically meaningful insights to support medical decision-making. Methods and Results: Building upon a previously developed XGBoost model with high discrimination (AUC-ROC = 0.964), we applied SHAP and LIME to interpret model predictions across five representative clinical cases. These cases included true positives, true negatives, false positives, false negatives, and borderline predictions. For each case, visual outputs were generated, feature attributions were analyzed, and explanations were evaluated by clinical experts based on clarity, trust, and alignment with medical reasoning. SHAP consistently identified relevant risk contributors such as MACE, creatinine, and frailty indicators, and offered both global and local interpretability. LIME explanations were more concise but showed variability in feature attribution, often omitting clinically significant variables like MACE. In false negative and borderline cases, SHAP provided clearer representations of risk, whereas LIME tended to oversimplify or misattribute contributing features. Clinician feedback favored SHAP in all dimensions evaluated. Conclusion: Our results suggest that SHAP outperforms LIME in providing clinically aligned, trustworthy, and interpretable explanations for ML-based mortality prediction in cardiac surgery. While both methods can enhance transparency, SHAP’s consistency and richer information content make it better suited for complex clinical use. These findings support the integration of SHAP-based interpretability tools into clinical decision support systems, particularly in scenarios where trust and explanation fidelity are critical for patient safety and risk communication.