Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective : To construct and compare six machine learning models for identifying high-risk factors of multidrug-resistant organism (MDRO) infection in sepsis patients using the MIMIC-IV (v3.1) database. Methods : We conducted a retrospective cohort study of ICU patients meeting Sepsis 3.0 diagnostic criteria from the MIMIC-IV database. Data underwent preprocessing including missing value handling, constant variable removal, and standardization. Key predictors were selected using LASSO regression and the Boruta algorithm. Six machine learning models (LGBM, RF, CatBoost, GBDT, MLP, KNNC) were developed, with SHAP applied for interpretability. Performance was evaluated via AUC, sensitivity, specificity, F1-score, and accuracy. Decision curve analysis (DCA) and calibration curves assessed clinical utility. Results : Among 23,191 patients, 2,806 (12.1%) had MDRO infections. Two-stage feature selection (LASSO + Boruta) identified nine core predictors: age, platelet count, red cell distribution width (RDW), blood glucose, lactic acid, partial pressure of oxygen (PO2), Acute Physiology Score III (APS III), hypertension (HTN), and acute kidney injury (AKI). The LGBM model achieved optimal performance (test AUC = 0.964, accuracy = 0.904, F1-score = 0.925). DCA demonstrated significant net clinical benefit for the LGBM and CatBoost models across thresholds of 0.2–0.6. SHAP analysis revealed HTN and AKI as top risk drivers for MDRO infection, while higher PO2 was the primary protective factor. Conclusion : Machine learning models, particularly LGBM, effectively identify ICU sepsis patients at high risk of MDRO infection. Key clinical features (e.g., HTN, AKI, PO2, RDW, lactic acid, APS III) coupled with SHAP interpretability provide a robust decision-support tool for early risk stratification and antimicrobial stewardship optimization.