Development of the Short Hospitalization Predictor (SHoP) Machine Learning Model Across Two Hospitals
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
To develop and evaluate an open-source machine learning (ML) models for predicting hospital short stays (length of stay [LOS] under 48 and 72 hours) exclusively using data available at the time of ED admission, with a novel application of target encoding diagnostic codes.
Materials and Methods
We trained two ML algorithms (Random Forest and XGBoost) on electronic health record (EHR) data from two hospitals to predict hospital short stays. We employed an innovative weighted target encoding method that converted categorical International Classification of Disease (ICD- 10) codes into numeric representations of their probabilistic contribution to LOS. We measured area under the receiver operating characteristic curve (AUC) for correctly predicting LOS under 48 or 72 hours, which we compared to logistic regression.
Results
The final sample included 8,693 adult patients admitted to an internal medicine service. Random Forest models achieved the highest performance for predicting LOS under 48 hours (AUROC=0.96, 95% CI 0.95-0.97; accuracy=91%) and under 72 hours (AUROC=0.94, 95% CI 0.93-0.95; accuracy=88%). These models outperformed logistic regression using the same features (48-hour AUROC=0.57, 95% CI 0.54-0.59 and accuracy=70%; 72-hour AUROC=0.59, 95% CI 0.57-0.61 and accuracy=56%).
Discussion
Leveraging an innovative target encoding method, the Short Hospitalization Prediction (SHoP) model substantially outperforms previous ML approaches in accurately predicting LOS under both 48 and 72 hours using only ED pre-admission data (AUC 0.94-0.96).
Conclusion
The technical innovation and predictive capability of the SHoP model enables powerful, real-time applications for optimizing patient flow and hospital resource utilization by identifying potentially divertible admissions while patients are still in the ED.