Integration of Clinical Indicators and Multiple Machine Learning Algorithms for Prognostic Evaluation in Sepsis Patients with Different BMI: Model Construction and Validation

Zhanzhi Long
Shijun Tong

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective Sepsis is a leading cause of intensive care unit (ICU) mortality, with body mass index (BMI) contributing to prognostic heterogeneity through the so-called "obesity paradox." This study aimed to develep and validate a BMI-stratified prognostic model for 28-day mortality in sepsis patients by integrating clinical indicators with multiple machine learning (ML) algorithms, and to explore BMI-specific predictive patterns. Methods We conducted a retrospective analysis of 654 sepsis patients admitted between August 2022–August 2024. Demographic (age, gender), anthropometric (weight/height→BMI), vital signs (heart rate, respiratory rate, SpO₂), laboratory (CRP, PCT, D-Dimer, PT, APTT), and severity scores (SOFA, APACHE II, GCS) were collected. Patients were stratified into three BMI groups: underweight (BMI < 18.5 kg/m², n = 98), normal weight (BMI 18.5–23.9 kg/m², n = 276), and overweight/obese (BMI ≥ 24 kg/m², n = 280). Eight ML algorithms were employed: Regularized Logistic Regression (LR-L1/L2), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), CatBoost, Support Vector Machine (SVM) with RBF kernel, and Multi-Layer Perceptron (MLP). Feature engineering comprised recursive feature elimination (RFE), multiple imputation for missing values, and outlier handling using the interquartile rang (IQR)method. Model performance was evaluated via 10-fold cross-validation (CV) and external validation (15% of cohort) using AUC, sensitivity, specificity, accuracy, and F1-score. SHAP (SHapley Additive exPlanations) and permutation importance were used for interpretability. Subgroup analyses compared model performance across BMI strata. Results The 28-day mortality rate was highest among underweight patients (45.9%), followed by those with normal weight (25.3%) and overweight/obese patients (18.7%), a trend consistent with the obesity paradox (p < 0.001). The RFE method identified a set of 13 key predictors: BMI group, SOFA score, APACHE II score, PCT, CRP, D-Dimer, initial heart rate, SpO₂, mechanical ventilation duration, age, and underlying disease. Among ML models, CatBoost demonstrated the best overall performance (training set: AUC = 0.90, 95% CI: 0.87–0.93; sensitivity = 0.86; specificity = 0.84; accuracy = 0.85; external validation: AUC = 0.88, 95% CI: 0.82–0.93). Subgroup analysis revealed: (1) Underweight group: XGBoost performed best (AUC = 0.87) with SOFA score and D-Dimer as top predictors; (2) Normal weight group: LightGBM was optimal (AUC = 0.86) driven by APACHE II score and PCT; (3) Overweight/obese group: CatBoost outperformed (AUC = 0.89) with BMI, CRP, and mechanical ventilation duration as key features. SHAP analysis revealed taht a SOFA score > 10 was consistently associated with a significantly increased mortality risk across all BMI groups, while BMI 24–28 kg/m² was protective only in patients aged ≥ 65 years. Conclusion The CatBoost model integrating clinical indicators and BMI stratification exhibits robust performance for 28-day mortality prediction in 654 sepsis patients. BMI-specific ML models and SHAP-based interpretability provide personalized risk stratification, supporting BMI-tailored sepsis management.

Version published to 10.21203/rs.3.rs-8038714/v1 on Research Square
Nov 18, 2025

Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database

This article has 5 authors:
1. Qianqian Zhang
2. Nianzhi Zhang
3. Ying Zheng
4. Jing Zhou
5. Ling Liu
This article has no evaluationsLatest version Dec 30, 2025
Development and Validation of a Prediction Model for Microvascular Complications of Type 2 Diabetes Based on Inflammation-Metabolism Composite Indicators

This article has 2 authors:
1. Title：LI Yuting
2. minawaer HUJIAAIHEMAITI
This article has no evaluationsLatest version Jan 6, 2026
Machine Learning-Based Risk Prediction Model for Fatigue in Chronic Heart Failure Patients

This article has 9 authors:
1. Min Zhou
2. Jingran Yang
3. Yimei Zhang
4. Yu Wang
5. Ruijie Yanglan
6. Qinlan Li
7. Yangjuan Bai
8. Wei Wei
9. Fang Ma
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database​

Development and Validation of a Prediction Model for Microvascular Complications of Type 2 Diabetes Based on Inflammation-Metabolism Composite Indicators

Machine Learning-Based Risk Prediction Model for Fatigue in Chronic Heart Failure Patients

Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database