Improving Hospital Length of Stay Prediction through Heterogeneous Data Integration from MIMIC-III Records

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate prediction of hospital length of stay (LoS) is a vital component in optimizing clinical workflows, resource allocation, and patient care. This study presents a comprehensive evaluation of machine learning models for both binary and multi-class LoS classification tasks using structured clinical variables, physiological measurements, and unstructured clinical notes. Seven data configurations were constructed from combinations of structured features (Z), including diagnoses, procedures, medications, laboratory tests, and microbiology results; MeSH-based symptoms (S); physiological signals (F); and textual representations (E): Z, F, E, ZS, ZSF, ZSE, and ZSEF. Five predictive models—Artificial Neural Networks (ANN), XGBoost, Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM)—were applied, with and without feature selection, where categorical features and Bag-of-Words representations were reduced to varied dimensions. Results indicate that the base structured feature set (Z) alone yields strong predictive performance across tasks. Moreover, the integration of additional data types—S, F, and E—either individually or in combination, consistently enhanced performance, with the ZSEF configuration achieving the highest F1-scores and AUC values in most cases. While the application of SMOTE did not yield substantial improvements in the global setting encompassing all hospital admissions, it demonstrated enhanced performance in disease-specific cohorts, particularly for patients admitted with lung cancer. Among the evaluated models, XGBoost and ANN demonstrated superior generalizability. These findings underscore the effectiveness of multimodal data integration and feature reduction techniques in advancing predictive modeling for hospital length of stay across diverse patient populations.

Article activity feed