The Evolution of Delirium Prediction in the Intensive Care Unit: A Systematic Review of Traditional, Machine Learning, and Deep Learning Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Delirium is a prevalent and severe form of acute brain dysfunction in the Intensive Care Unit (ICU), linked to poor patient outcomes. Early and accurate prediction is crucial for implementing preventive strategies. While traditional statistical models have been foundational, the advent of Machine Learning (ML) and Deep Learning (DL) has introduced a new paradigm in predictive analytics. This systematic review synthesizes the evolution of ICU delirium prediction models, from traditional statistical methods to modern ML and DL architectures, evaluating their performance, methodological rigor, and clinical applicability. Main body: We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A comprehensive search of PubMed/MEDLINE, Embase, Web of Science, and the Cochrane Library was performed for studies published between January 2015 and February 2026. We included studies that developed or validated a prediction model for delirium in adult ICU patients. We extracted data on study characteristics, model architecture, performance metrics (e.g., Area Under the Receiver Operating Characteristic curve - AUROC), and predictors. Methodological quality was assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Our search yielded 4,215 unique records, from which 36 studies were included in the qualitative synthesis. The review identified a clear progression from static, logistic regression-based models like PRE-DELIRIC (AUROC ~0.71-0.89) to dynamic, interpretable ML models. Modern ML models, particularly XGBoost and Random Forest, consistently achieve high discrimination (AUROC ~0.80-0.91). More recently, DL architectures such as Temporal Convolutional Networks (TCNs) and models incorporating attention mechanisms have demonstrated the ability to capture complex temporal dependencies in Electronic Health Record (EHR) data, with some achieving AUROCs up to 0.86. Key predictors consistently identified across all model types include age, severity of illness scores (e.g., APACHE II, SOFA), Glasgow Coma Scale (GCS), mechanical ventilation, and sedative use. The use of interpretability frameworks like SHAP has become more common, addressing the “black box” nature of complex models. Conclusion: The landscape of ICU delirium prediction has significantly advanced, moving towards more dynamic, data-driven, and interpretable models. While ML and DL models show superior performance, challenges related to external validation, clinical integration, and prospective evaluation remain. Future research should focus on validating these advanced models in diverse clinical settings and developing implementation strategies to translate predictive power into improved patient care.

Article activity feed