XGBoost-Based Prediction of ICU Mortality in Sepsis-Associated Acute Kidney Injury Patients Using MIMIC-IV Database with Validation from eICU Database

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Sepsis-Associated Acute Kidney Injury (SA-AKI) leads to high mortality in intensive care. This study develops machine learning models using the Medical Information Mart for Intensive Care IV (MIMIC-IV) database to predict Intensive Care Unit (ICU) mortality in SA-AKI patients. External validation is conducted using the eICU Collaborative Research Database.

Methods

For 9,474 identified SA-AKI patients in MIMIC-IV, key features like lab results, vital signs, and comorbidities were selected using Variance Inflation Factor (VIF), Recursive Feature Elimination (RFE), and expert input, narrowing to 24 predictive variables. An Extreme Gradient Boosting (XGBoost) model was built for in-hospital mortality prediction, with hyperparameters optimized using GridSearch. Model interpretability was enhanced with SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME). External validation was conducted using the eICU database.

Results

The proposed XGBoost model achieved an internal Area Under the Receiver Operating Characteristic curve (AUROC) of 0.878 (95% Confidence Interval: 0.859–0.897). SHAP identified Sequential Organ Failure Assessment (SOFA), serum lactate, and respiratory rate as key mortality predictors. LIME highlighted serum lactate, Acute Physiology and Chronic Health Evaluation II (APACHE II) score, total urine output, and serum calcium as critical features.

Conclusions

The integration of advanced techniques with the XGBoost algorithm yielded a highly accurate and interpretable model for predicting SA-AKI mortality across diverse populations. It supports early identification of high-risk patients, enhancing clinical decision-making in intensive care. Future work needs to focus on enhancing adaptability, versatility, and real-world applications.

Graphical Abstract

Highlights

  • The study implemented a robust machine learning pipeline for predicting ICU mortality in sepsis-associated acute kidney injury (SA-AKI) patients. This pipeline included advanced data preprocessing techniques, stratified imputation for handling missing values, and a three-stage feature selection strategy using Variance Inflation Factor (VIF), Recursive Feature Elimination (RFE), and expert clinical input. The optimized feature set was then used to train an XGBoost model with hyperparameter tuning via GridSearchCV, achieving high predictive accuracy with an AUROC of 0.878 (95% CI: 0.859–0.897) and enhanced clinical applicability. The interpretability analysis using SHAP and LIME identified critical features such as SOFA score, serum lactate, and respiratory rate as key mortality predictors.

  • The model was externally validated using the eICU Collaborative Research Database, confirming its generalizability and robustness across diverse patient populations with an AUROC of 0.720 (95% CI: 0.708–0.733). This transparent, data-driven approach supports early identification of high-risk patients, optimizing clinical decision-making and resource allocation in intensive care settings.

Article activity feed