Calibrated and Interpretable Machine Learning for ICU Mortality Prediction Using First 24-Hour Clinical Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
To develop, calibrate, and interpret machine learning models for predicting in-hospital mortality among intensive care unit (ICU) patients using clinical data from the first 24 hours of admission.
Methods
We analyzed 53,866 adult ICU admissions from MIMIC-IV (v2.2), including 5,787 in-hospital deaths (10.7%). An enhanced feature-engineering pipeline generated 88 laboratory features capturing distributional characteristics, temporal trends, and measurement frequency. Five classifiers were evaluated: 𝓁 2 -regularized logistic regression, random forest, XGBoost, LightGBM, and a calibrated soft-voting ensemble. Models were developed using a stratified 64:8:8:20 split for training, validation and hyperparameter tuning, calibration, and testing. Performance was assessed on a held-out test set ( n = 10,774) using AUROC, AUPRC, Brier score, calibration analysis, decision curve analysis (DCA), and SHAP-based interpretation.
Results
The calibrated ensemble achieved the best overall performance (AUROC 0.856, 95% CI 0.846–0.867; AUPRC 0.449, 95% CI 0.418–0.480) with a Brier score of 0.078. XGBoost (AUROC 0.856; AUPRC 0.435) and LightGBM (AUROC 0.854; AUPRC 0.436) performed comparably to the ensemble and significantly outperformed logistic regression (AUROC 0.823; AUPRC 0.376), yielding absolute AUROC improvements of approximately 0.031–0.033 ( p < 0.001). Calibration reduced Brier scores by 42% for XGBoost (0.134 to 0.078) and 50% for LightGBM (0.151 to 0.076). Decision curve analysis demonstrated consistent net benefit across the 5%–20% risk-threshold range. Key predictors included age, blood urea nitrogen, ICU subtype, measurement frequency, and lactate-related features, with consistent performance across ICU subtypes (AUROC > 0.79).
Conclusion
A calibrated and interpretable machine learning framework using early ICU data provides accurate and clinically actionable mortality risk estimates. By integrating trajectory-aware feature engineering, probabilistic calibration, and decision-analytic evaluation, this approach advances ICU mortality prediction toward reliable clinical decision support.