Case-Control Matching Erodes Feature Discriminability for Machine Learning-Based Sepsis Prediction in ICUs: A Retrospective Cohort Study

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Sepsis is a leading cause of mortality in the intensive care unit (ICU), and early detection using machine learning (ML) models is critical for timely intervention. To address methodological challenges such as class imbalance and differences in patient trajectories, researchers increasingly adapt case-control matching from epidemiology. However, its impact on predictive modeling performance in ICU sepsis prediction remains insufficiently understood. Methods: We conducted a retrospective multi-cohort analysis using three large harmonized ICU datasets: HiRID, MIMIC-IV, and eICU. We evaluated the effects of case-control matching on both feature discriminability and predictive performance. Matching strategies incorporated temporal alignment and demographic criteria, and were compared against original imbalanced cohorts and undersampled cohorts at equivalent case-to-control ratios. To quantify changes in feature significance, we applied linear mixed-effects models across clinical variables. We then trained multiple ML models, including random forests, balanced random forests, LightGBM, XGBoost, logistic regression, and convolutional neural networks, and evaluated performance on the original test sets using AUROC and normalized AUPRC. Results: Case-control matching consistently reduced the number of significant predictive features across all three cohorts. In the original datasets, 35 to 43 features showed significant differences between septic and non-septic patients, whereas this number declined to 24 to 29 in the most strongly matched settings. In contrast, undersampling largely preserved feature discriminability. Models trained on the original imbalanced cohorts showed robust performance, while models trained on undersampled cohorts often achieved very strong discrimination. However, models trained on matched cohorts exhibited high degradation, with AUROC values frequently to around 0.50 and normalized AUPRC dropping to baseline or below. These patterns were consistent across datasets, matching ratios, and model classes. Conclusion: Case-control matching creates a critical trade-off in ML-based sepsis prediction: although it satisfies the epidemiological objective of balancing cohorts, it removes clinically informative differences that are essential for prediction. Our findings caution against the uncritical transfer of methods designed for causal inference into predictive modeling tasks in the ICU and highlight the need for strategies that preserve predictive signal while addressing dataset imbalance.

Article activity feed