A Complication-Stratified Dual-Stage Ensemble Model for Predicting Postoperative Outcomes in Gastric Cancer Surgery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Postoperative complications after gastric cancer surgery are characterized by an inherent and irreducible class imbalance (15–20%), which leads to low sensitivity for high-risk patients and heterogeneous length-of-stay (LOS) trajectories. This study aimed to develop and validate a complication-stratified dual-stage ensemble model addressing these methodological challenges. Methods: In this retrospective study of 355 gastrectomy patients, we implemented a two-stage framework: (1) a recall-constrained soft-voting ensemble classifier (Logistic Regression, Random Forest, GBDT, LightGBM, XGBoost) for predicting moderate-to-severe complications (Clavien–Dindo ≥II), using Borderline-SMOTE+ENN to handle imbalance; and (2) a complication-stratified ensemble regressor for LOS prediction. Soft voting and bootstrap aggregating (7 rounds, 85% sampling) enhanced stability. Performance was assessed via five-fold cross-validation with stability checks and an independent test set (n=89). SHAP provided interpretability. Results: The cohort's complication rate was 17.5% (62/355). The classifier achieved cross-validation recall of 0.91 and, after threshold optimization, test recall of 1.00 ± 0.00 (identifying all 23 complicated test patients), with precision 0.68 ± 0.12 and F1-score 0.81 ± 0.08. The stratified regressor yielded overall test MAE of 2.56 ± 0.23 days, with precise prediction for uncomplicated patients (MAE = 1.84 ± 0.22 days) and an honest estimate for complicated patients (MAE = 4.73 ± 0.49 days). SHAP identified inflammatory ratios (CRP_ratio, PCT_ratio) and recovery metrics (drainage duration, oral feeding time) as key predictors. Conclusions: This study presents a generalizable strategy for handling irreducible class imbalance by designing models around clinical realities. The framework achieves meaningful improvements in risk stratification and LOS prediction for gastric cancer surgery, with potential to enhance patient care and resource allocation.

Article activity feed