An Integrated Random Forest– and LASSO-Derived Nomogram for Predicting Postoperative Nosocomial Infections in Colorectal Cancer Patients
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective We sought to delineate the independent risk factors underlying postoperative nosocomial infections in colorectal cancer patients and to construct and validate a nomogram for individualized risk prediction, thereby enabling early clinical identification of high-risk individuals and the implementation of targeted preventive strategies. Methods We retrospectively analyzed 1,146 colorectal cancer patients who underwent surgical resection, stratifying those treated between 2020 and 2021 (n = 762) as the training set and those treated in 2022 (n = 384) as the validation set. Candidate predictors were first evaluated by univariate analysis. We then applied a random forest to quantify variable importance and employed LASSO regression to refine feature selection and mitigate multicollinearity. Independent risk factors emerging from these steps were confirmed via multivariate logistic regression. Based on these determinants, we developed a nomogram for individualized risk estimation. Model performance was rigorously assessed in both cohorts: discrimination was measured by the area under the receiver operating characteristic curve, calibration was examined through calibration plots, and clinical benefit was appraised using decision curve analysis. Results Postoperative nosocomial infections occurred in 9.6% (110/1,146) of patients, most frequently presenting as lower respiratory tract infections (34.6%) and surgical-site infections (30.9%). Multivariate logistic regression identified prolonged operative duration, the presence of postoperative complications, open surgical approach, ASA score ≥ III, a history of coronary artery disease, use of postoperative drainage, and persistent fever lasting ≥ 3 days as independent predictors. The resulting nomogram demonstrated excellent discrimination, with an area under the ROC curve of 0.860 (95% CI, 0.815–0.905) in the training cohort and 0.827 (95% CI, 0.774–0.880) in the validation cohort. Calibration plots showed high concordance between predicted and observed infection rates, and decision curve analysis confirmed the model’s clinical utility across relevant threshold probabilities. Conclusions Our nomogram enables precise stratification of colorectal cancer patients by their postoperative infection risk, highlighting perioperative factors—such as operative duration, surgical approach, and ASA grade—that warrant targeted management. Future prospective, multicentre validation will be essential to refine and generalize the model’s applicability.