Multimodal ICU In-hospital Mortality Prediction: A Scoping Review of Transparency, Calibration, and Translation Readiness
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background : Multimodal machine learning for intensive care unit (ICU) in-hospital mortality (IHM) prediction has rapidly expanded, yet reporting practices relevant to clinical readiness, including calibration, clinical utility, robustness to data missingness, and actionable transparency, remain heterogeneous. Objective : To map the landscape of multimodal IHM prediction studies and synthesize evidence on (i) fusion architectures and transparency implementations, (ii) evaluation beyond discrimination (calibration, clinical utility, and per-case uncertainty), and (iii) operational readiness (missing modality handling, prediction timeliness, external validation, subgroup evaluation, and reproducibility artifacts). Methods : We conducted a scoping review of peer-reviewed studies (English, January 2010 – November 2025) that fused at least two data modalities by provenance and predicted IHM in adult ICU populations. Searches were performed in PubMed, Scopus, and Google Scholar. Screening and extraction used an established AI-assisted workflow with two-reviewer verification. We summarized extracted fields from 115 included studies using descriptive statistics and narrative synthesis, including thematic analysis of author-reported gaps, limitations, and future directions. Results : Among 115 included studies, feature-level (early) fusion predominated (n = 80, 69.6%), followed by intermediate fusion (n = 30, 26.1%), late fusion (n = 4, 3.5%), and hybrid fusion (n=1, 0.9%). The modality landscape was dominated by physiologic time-series and coded/tabular EHR data; clinical text, imaging, and high-frequency waveforms each appeared in a minority of studies. Feature-level interpretability was reported in 21 studies (18.3%), with full per-case attribution in only 13 (11.3%); modality-level attribution was identified in 2 studies (1.7%), and missing-modality policies in 7 (6.1%). Calibration was assessed in 40 studies (34.8%), decision curve analysis in 20 (17.4%), and per-case uncertainty quantification in 1 (0.9%). External validation was performed in 28 studies (24.3%), prediction timeliness was evaluated in 22 (19.1%), and code repositories were provided in 29 (25.2%); no study reported a structured model card. Conclusions : The multimodal IHM literature emphasizes discrimination performance and early fusion, yet the evidence base remains immature for clinical translation, with systematic gaps in calibration, clinical utility, per-case transparency, missing-modality robustness, and reproducibility. These findings motivate standardized reporting and evaluation frameworks tailored to multimodal ICU prediction.