Development of a Machine Learning Model for Predicting In-Hospital Mortality and Analyzing Associated Risk Factors Using Large Patient Samples

Jinxin Liu
Haoyue He
Yanglingxi Wang
Jun Du
Kaixin Liang
Jun Xue
Yidan Liang
Peng Chen
Qiang Yang
Ying Yin
Guixue Wang
Xue Jiang
Yongbing Deng

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective

This study endeavors to construct a machine learning model to forecast in-hospital mortality and dissect associated risk factors, utilizing a vast dataset from multiple hospitals in Chongqing.

Methods

We amassed detailed baseline data encompassing demographics, medical histories, laboratory tests, and imaging indicators from 23,307 ischemic stroke patients. The NIHSS score was derived from admission records, and both in-hospital survival status and causes of death were meticulously documented. Employing the missForest method, we imputed missing values, addressing data imbalance through random oversampling, validated via five-fold cross-validation. The SHAPRFECV technique was instrumental in identifying the most impactful features, steering clear of multicollinearity. A suite of machine learning models, including LR, RF, and KNN, were meticulously tuned using three-fold cross-validation and grid search to optimize hyperparameters.

Results

Our cohort had an average age of 67.347 ± 12.822 years, a baseline NIHSS score of 8.430 ± 3.162, and a 51.186% male predominance, with an in-hospital mortality rate of 6.183%. The Random Forest model excelled with an AUC of 0.940 in the test set, trailed closely by CatBoost at 0.937, LightGBM at 0.930, and XGBoost at 0.929. Notably, CatBoost boasted the highest F1 score of 0.595420 on the test set, with no significant predictive performance disparity between it and the Random Forest model (p = 0.500).

Conclusion

Grounded in data from four hospitals in Chongqing, our machine learning model, predicated on baseline features, not only streamlines clinical application but also ensures robust predictive efficacy. It provides an in-depth analysis of mortality risk factors, serving as a pivotal reference for clinical decision-making. Future endeavors will concentrate on validating the model within larger-scale, geographically diverse samples, thereby amplifying its applicability and value in clinical practice.

Version published to 10.1101/2025.02.27.25323062 on medRxiv
Mar 1, 2025

ICU Mortality and LOS Prediction Models Using MachineLearning Based on Both Real and Simulated Data

This article has 3 authors:
1. Girma Neshir Alemneh
2. Hirut Bekele Ashagrie
3. Lemlem Kassa Tegegne
This article has no evaluationsLatest version Jan 14, 2026
Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

This article has 16 authors:
1. Hao Liu
2. Meijun Liu
3. Xinmiao Guan
4. Feng Cao
5. Changhao Liang
6. Zhongwen Qi
7. Jiaqi Hui
8. Junnan Zhao
9. Jingli Xing
10. Jianguo Zhou
11. Dong Zhang
12. Lei Liu
13. Xiaoliang Hao
14. Minjing Luo
15. Fengqin Xu
16. Yutong Fei
This article has no evaluationsLatest version Jan 12, 2026
Risk Stratification for In-Hospital Mortality in Alzheimer’s Disease Using Interpretable Regression and Explainable AI

This article has 3 authors:
1. Tursun Alkam
2. Ebrahim Tarshizi
3. Andrew H. Van Benschoten
This article has no evaluationsLatest version Jan 7, 2026

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusion

Article activity feed

Related articles

ICU Mortality and LOS Prediction Models Using MachineLearning Based on Both Real and Simulated Data

Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

Risk Stratification for In-Hospital Mortality in Alzheimer’s Disease Using Interpretable Regression and Explainable AI