Risk Prediction in Spine Surgery: Traditional Models, Artificial Intelligence, and the Challenge of Clinical Translation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Accurate perioperative risk stratification is central to patient safety, informed consent, and resource allocation in spine surgery. Traditional regression-based risk scores and indices are widely used and clinically familiar, yet their predictive performance remains modest and may not capture the heterogeneity of modern spine practice. Concurrently, artificial intelligence and machine learning (AI/ML) approaches have been increasingly applied to surgical risk prediction, raising questions regarding their incremental value, interpretability, and readiness for clinical adoption. The objective of this review was to synthesize existing evidence on spine surgery risk prediction models, comparing traditional approaches with emerging AI/ML methods, and to identify key translational barriers. Methods A structured literature search was conducted across major biomedical databases using combinations of spine surgery, risk prediction, perioperative outcomes, and AI-related terms, with additional conceptual searches targeting explainability, validation, and clinical translation. Studies were selected for relevance to adult spine surgery risk prediction, model development or validation, and methodological or translational considerations. Given substantial heterogeneity in study design and outcomes, findings were synthesized qualitatively using a narrative approach. Results. Traditional spine-specific risk models demonstrate fair to good discrimination for common outcomes, with typical AUCs ranging from approximately 0.64 to 0.78. AI/ML models often report modest improvements in discrimination over regression-based approaches, particularly for common outcomes such as ICU admission and mortality, but gains are inconsistent and context dependent. Across both model types, external validation, calibration drift, limited prospective outcome evidence, and challenges related to interpretability and workflow integration remain prominent. Conclusions. Traditional risk models remain interpretable, trusted, and competitively performant for many spine surgery outcomes. While AI/ML approaches expand data integration and interaction modeling, their clinical impact is constrained by validation, trust, and implementation barriers. Future progress will depend less on incremental performance gains and more on rigorous external validation, prospective outcome studies, and integration into clinical workflows.