A Comprehensive Machine Learning and SHAP Framework for Predicting I CU Length of Stay Using Non-Therapeutic Clinical Indicators
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective: Accurate prediction of Intensive Care Unit Length of Stay (ICU LOS) is a cornerstone for enhancing operational efficiency, optimizing clinical decision-making, and improving patient outcomes in critical care settings. This study aims to develop a robust and interpretable predictive framework by leveraging a comprehensive suite of machine learning algorithms and the SHapley Additive exPlanations (SHAP) method, utilizing exclusively non-therapeutic clinical indicators available at admission. Methods: A retrospective analysis was conducted on a curated cohort of 654 adult patients admitted to the ICU of a tertiary-care hospital. A set of 30 non-therapeutic indicators, encompassing demographics, severity scores, comorbidities, and admission details, was meticulously curated. We implemented and rigorously tuned eight distinct ML models: Linear Regression, Lasso Regression, Ridge Regression, Decision Tree, Random Forest, eXtreme Gradient Boosting, Support Vector Regression, and a Multi-Layer Perceptron neural network. Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²). The best-performing model was interpreted using SHAP for global feature importance and local explainability. Results: The ensemble tree-based models, particularly XGBoost and Random Forest, demonstrated superior predictive performance. XGBoost achieved the best results with an MAE of 2.15 days, RMSE of 3.42 days, and R² of 0.71. SHAP analysis revealed that the Sequential Organ Failure Assessment (SOFA) score, patient age, type of admission, and specific comorbidities (metastatic cancer, congestive heart failure) were the most influential predictors. The Glasgow Coma Scale (GCS) score was also identified as a critical factor, where lower scores significantly increased the predicted LOS. Conclusion: The integration of advanced ML models with the SHAP framework provides a powerful, accurate, and clinically interpretable tool for early prediction of ICU LOS. By identifying key drivers of prolonged stay from readily available non-therapeutic data, this approach facilitates proactive clinical management and strategic resource planning, ultimately supporting enhanced operational efficiency in the ICU.