Calibrated and Interpretable Machine Learning for ICU Mortality Prediction Using First 24-Hour Clinical Data

Abdallah Alsammani
Merasia Johnson
Jessica Elrefaei

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective

To develop, calibrate, and interpret machine learning models for predicting in-hospital mortality among intensive care unit (ICU) patients using clinical data from the first 24 hours of admission.

Methods

We analyzed 53,866 adult ICU admissions from MIMIC-IV (v2.2), including 5,787 in-hospital deaths (10.7%). An enhanced feature-engineering pipeline generated 88 laboratory features capturing distributional characteristics, temporal trends, and measurement frequency. Five classifiers were evaluated: 𝓁 ₂ -regularized logistic regression, random forest, XGBoost, LightGBM, and a calibrated soft-voting ensemble. Models were developed using a stratified 64:8:8:20 split for training, validation and hyperparameter tuning, calibration, and testing. Performance was assessed on a held-out test set ( n = 10,774) using AUROC, AUPRC, Brier score, calibration analysis, decision curve analysis (DCA), and SHAP-based interpretation.

Results

The calibrated ensemble achieved the best overall performance (AUROC 0.856, 95% CI 0.846–0.867; AUPRC 0.449, 95% CI 0.418–0.480) with a Brier score of 0.078. XGBoost (AUROC 0.856; AUPRC 0.435) and LightGBM (AUROC 0.854; AUPRC 0.436) performed comparably to the ensemble and significantly outperformed logistic regression (AUROC 0.823; AUPRC 0.376), yielding absolute AUROC improvements of approximately 0.031–0.033 ( p < 0.001). Calibration reduced Brier scores by 42% for XGBoost (0.134 to 0.078) and 50% for LightGBM (0.151 to 0.076). Decision curve analysis demonstrated consistent net benefit across the 5%–20% risk-threshold range. Key predictors included age, blood urea nitrogen, ICU subtype, measurement frequency, and lactate-related features, with consistent performance across ICU subtypes (AUROC > 0.79).

Conclusion

A calibrated and interpretable machine learning framework using early ICU data provides accurate and clinically actionable mortality risk estimates. By integrating trajectory-aware feature engineering, probabilistic calibration, and decision-analytic evaluation, this approach advances ICU mortality prediction toward reliable clinical decision support.

Version published to 10.64898/2026.05.30.26354524 on medRxiv
Jun 2, 2026

ADVISE: A Machine Learning Framework for Early Recognition of a Surrogate Marker for Ventilator-Associated Pneumonia Using Routinely Collected Critical Care Data

This article has 5 authors:
1. Nabeel Amiruddin
2. Sophie Mellor
3. Rehman Crisp
4. Ananya Nair
5. Maya Patel
This article has no evaluationsLatest version Jun 24, 2026
Temporal Feature Engineering and Ensemble Learning for Predicting 28-Day Mortality in ICU Patients with Alcoholic Cirrhosis

This article has 7 authors:
1. Janet Sanjaya
2. Mohammadsaeed Haghi
3. Nausin Kudrot
4. Sakshie Pathak
5. Shreyas V. Chandramouli
6. Kamiar Alaei
7. Maryam Pishgar
This article has no evaluationsLatest version Jul 2, 2026
Early identification of advanced chronicity (MACA) patients using Machine Learning models: a population-based predictive approach for proactive care stratification

This article has 15 authors:
1. Miguel Boubeta
2. Marc Moreno Ariño
3. Oscar Duems Noriega
4. Marina Roig Soronellas
5. Julia Veríssimo Guillén
6. Ingrid Bullich Marín
7. Carina Sanz Blazquez
8. Joana Barrio Medina
9. Montserrat López Postigo
10. Luis Lorenzo
11. Marcos Montaña-Méndez
12. Cristóbal Bernardo-Castiñeira
13. María Dolores López Lores
14. Víctor Borrás-Marco
15. Sandra Posas Paradera
This article has no evaluationsLatest version Jun 26, 2026

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusion

Article activity feed

Related articles

ADVISE: A Machine Learning Framework for Early Recognition of a Surrogate Marker for Ventilator-Associated Pneumonia Using Routinely Collected Critical Care Data

Temporal Feature Engineering and Ensemble Learning for Predicting 28-Day Mortality in ICU Patients with Alcoholic Cirrhosis

Early identification of advanced chronicity (MACA) patients using Machine Learning models: a population-based predictive approach for proactive care stratification