Case-Control Matching Erodes Feature Discriminability for Machine Learning-Based Sepsis Prediction in ICUs: A Retrospective Cohort Study

Sophia Ehlers
Youssef Farag
Fanny Tranchellini
Tim Hahn
Catherine Jutzeler
Lakmal Meegahapola

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Sepsis is a leading cause of mortality in the intensive care unit (ICU), and early detection using machine learning (ML) models is critical for timely intervention. To address methodological challenges such as class imbalance and differences in patient trajectories, researchers increasingly adapt case-control matching from epidemiology. However, its impact on predictive modeling performance in ICU sepsis prediction remains insufficiently understood. Methods: We conducted a retrospective multi-cohort analysis using three large harmonized ICU datasets: HiRID, MIMIC-IV, and eICU. We evaluated the effects of case-control matching on both feature discriminability and predictive performance. Matching strategies incorporated temporal alignment and demographic criteria, and were compared against original imbalanced cohorts and undersampled cohorts at equivalent case-to-control ratios. To quantify changes in feature significance, we applied linear mixed-effects models across clinical variables. We then trained multiple ML models, including random forests, balanced random forests, LightGBM, XGBoost, logistic regression, and convolutional neural networks, and evaluated performance on the original test sets using AUROC and normalized AUPRC. Results: Case-control matching consistently reduced the number of significant predictive features across all three cohorts. In the original datasets, 35 to 43 features showed significant differences between septic and non-septic patients, whereas this number declined to 24 to 29 in the most strongly matched settings. In contrast, undersampling largely preserved feature discriminability. Models trained on the original imbalanced cohorts showed robust performance, while models trained on undersampled cohorts often achieved very strong discrimination. However, models trained on matched cohorts exhibited high degradation, with AUROC values frequently to around 0.50 and normalized AUPRC dropping to baseline or below. These patterns were consistent across datasets, matching ratios, and model classes. Conclusion: Case-control matching creates a critical trade-off in ML-based sepsis prediction: although it satisfies the epidemiological objective of balancing cohorts, it removes clinically informative differences that are essential for prediction. Our findings caution against the uncritical transfer of methods designed for causal inference into predictive modeling tasks in the ICU and highlight the need for strategies that preserve predictive signal while addressing dataset imbalance.

Version published to 10.21203/rs.3.rs-9066923/v1 on Research Square
Apr 9, 2026

Unveiling Clinical Heterogeneity in Statin-Treated Sepsis Patients: A Machine Learning-Based Subphenotyping Study Leveraging the MIMIC-IV Database

This article has 8 authors:
1. Liangzhe Zou
2. Xinbei Zhang
3. Hongxia Cai
4. Yao Zhou
5. Gaoke Kong
6. Zhimei Gao
7. Fengshou Gao
8. Su Tu
This article has no evaluationsLatest version Apr 9, 2026
Predicting Mortality Risk in Sepsis-Induced Early Coagulopathy: A Multicenter Comparison of Machine Learning and Nomogram Approaches

This article has 2 authors:
1. hongwei duan
2. Yan Huang
This article has no evaluationsLatest version Apr 12, 2026
Dynamic RDW Trajectories Predict Mortality in Sepsis-Associated Delirium: A Group-Based Trajectory Modeling Study

This article has 5 authors:
1. Shuyang Dai
2. Bingjie Li
3. Zongshan Zhang
4. Gaoli Zhang
5. Poshi Xu
This article has no evaluationsLatest version Apr 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unveiling Clinical Heterogeneity in Statin-Treated Sepsis Patients: A Machine Learning-Based Subphenotyping Study Leveraging the MIMIC-IV Database

Predicting Mortality Risk in Sepsis-Induced Early Coagulopathy: A Multicenter Comparison of Machine Learning and Nomogram Approaches

Dynamic RDW Trajectories Predict Mortality in Sepsis-Associated Delirium: A Group-Based Trajectory Modeling Study