Multimodal ICU In-hospital Mortality Prediction: A Scoping Review of Transparency, Calibration, and Translation Readiness

Alexander Bakumenko
Janine Hoelscher
D. Hudson Smith
Svetlana Bakumenko
Aaron J. Masino
Jihad S. Obeid

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background : Multimodal machine learning for intensive care unit (ICU) in-hospital mortality (IHM) prediction has rapidly expanded, yet reporting practices relevant to clinical readiness, including calibration, clinical utility, robustness to data missingness, and actionable transparency, remain heterogeneous. Objective : To map the landscape of multimodal IHM prediction studies and synthesize evidence on (i) fusion architectures and transparency implementations, (ii) evaluation beyond discrimination (calibration, clinical utility, and per-case uncertainty), and (iii) operational readiness (missing modality handling, prediction timeliness, external validation, subgroup evaluation, and reproducibility artifacts). Methods : We conducted a scoping review of peer-reviewed studies (English, January 2010 – November 2025) that fused at least two data modalities by provenance and predicted IHM in adult ICU populations. Searches were performed in PubMed, Scopus, and Google Scholar. Screening and extraction used an established AI-assisted workflow with two-reviewer verification. We summarized extracted fields from 115 included studies using descriptive statistics and narrative synthesis, including thematic analysis of author-reported gaps, limitations, and future directions. Results : Among 115 included studies, feature-level (early) fusion predominated (n = 80, 69.6%), followed by intermediate fusion (n = 30, 26.1%), late fusion (n = 4, 3.5%), and hybrid fusion (n=1, 0.9%). The modality landscape was dominated by physiologic time-series and coded/tabular EHR data; clinical text, imaging, and high-frequency waveforms each appeared in a minority of studies. Feature-level interpretability was reported in 21 studies (18.3%), with full per-case attribution in only 13 (11.3%); modality-level attribution was identified in 2 studies (1.7%), and missing-modality policies in 7 (6.1%). Calibration was assessed in 40 studies (34.8%), decision curve analysis in 20 (17.4%), and per-case uncertainty quantification in 1 (0.9%). External validation was performed in 28 studies (24.3%), prediction timeliness was evaluated in 22 (19.1%), and code repositories were provided in 29 (25.2%); no study reported a structured model card. Conclusions : The multimodal IHM literature emphasizes discrimination performance and early fusion, yet the evidence base remains immature for clinical translation, with systematic gaps in calibration, clinical utility, per-case transparency, missing-modality robustness, and reproducibility. These findings motivate standardized reporting and evaluation frameworks tailored to multimodal ICU prediction.

Version published to 10.21203/rs.3.rs-9217081/v1 on Research Square
Mar 26, 2026

From Model Performance to Screening Readiness: An Audit-Grade Evidence Mapping Study in Prediabetes

This article has 1 author:
1. Mohammad Alshantti
This article has no evaluationsLatest version Feb 4, 2026
The Evolution of Delirium Prediction in the Intensive Care Unit: A Systematic Review of Traditional, Machine Learning, and Deep Learning Models

This article has 2 authors:
1. Shihshuan Fang
2. Sheng-Han Chen
This article has no evaluationsLatest version Mar 11, 2026
Evaluating the predictive power of a machine learning to predict the need for neonatal resuscitation

This article has 7 authors:
1. Mojdeh Banaei
2. Nasibeh Roozbeh
3. Fatemeh Abdi
4. Fatemeh Darsareh
5. Vahid Mehrnoush
6. Farideh Montazeri
7. Mohammadsadegh Vahidi Farashah
This article has no evaluationsLatest version Feb 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

From Model Performance to Screening Readiness: An Audit-Grade Evidence Mapping Study in Prediabetes

The Evolution of Delirium Prediction in the Intensive Care Unit: A Systematic Review of Traditional, Machine Learning, and Deep Learning Models

Evaluating the predictive power of a machine learning to predict the need for neonatal resuscitation