Model uncertainty quantification: A post hoc calibration approach for heart disease prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We investigate whether post-hoc calibration improves the clinical trustworthiness of heart-disease predictions beyond conventional accuracy metrics. Using a structured clinical dataset (1,025 records; 85/15 train-test split), we benchmarked six classifiers logistic regression, SVM, k-nearest neighbors, naïve Bayes, random forest, and XGBoost on accuracy, ROC-AUC, precision, recall, and F1, and then evaluated probability quality before and after Platt (sigmoid) and isotonic calibration using Brier score, expected calibration error (ECE), log loss, Spiegelhalter’s Z-test, and reliability diagrams. Baseline discrimination was high (e.g., SVM: accuracy 92.9%, ROC-AUC 99.4%, F1 92.8%), and ensembles achieved perfect test-set scores (random forest and XGBoost: 100% across metrics), prompting calibration analysis. Isotonic calibration consistently improved probability quality for most models: random forest Brier from 0.007 to 0.002, ECE 0.051 to 0.011, log loss 0.056 to 0.012; naive Bayes Brier from 0.162 to 0.132, ECE 0.145 to 0.118, log loss 1.936 to 0.446; SVM ECE 0.086 to 0.044 and log loss 0.142 to 0.133. Platt scaling helped some models but occasionally worsened calibration (e.g., KNN ECE 0.035 to 0.081). Reliability diagrams corroborated these trends, with isotonic yielding curves closer to the 45° diagonal line, while Spiegelhalter’s test moved toward non-significance for several models’ post-calibration. Ultimately, Isotonic calibration delivered the most consistent gains in probability reliability while preserving discrimination, strengthening the interpretability and clinical actionability of model outputs.