Comparative Evaluation of Machine Learning and Deep Learning Models for Early Prediction of Severe Acute Pancreatitis: A Multi-Model Study Using the 2012 Revised Atlanta Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Acute pancreatitis (AP) is a common gastrointestinal emergency with a subset of patients progressing to severe acute pancreatitis (SAP), which carries substantial morbidity and mortality. Current clinical severity scores such as BISAP, APACHE II, Ranson, and the Modified CT Severity Index require 24-48 hours of observation before reliable assessment is possible, limiting early triage. Machine learning (ML) approaches using routine admission laboratory values may enable earlier, more accurate prediction.

Methods

We evaluated 11 models spanning three architectural families—classical ML (Logistic Regression, Random Forest, Gradient Boosting), feedforward deep learning (MLP, Residual MLP, Attention MLP), and recurrent deep learning (LSTM, Stacked LSTM, Bidirectional LSTM, LSTM+Attention, CNN-LSTM) —on a Chinese AP cohort of 722 patients (585 severe, 137 mild) labelled according to the 2012 Revised Atlanta Classification. Performance was assessed via 5-fold stratified cross-validation using AUC-ROC, F1 score, sensitivity, specificity, and positive predictive value (PPV), with decision thresholds optimised for maximal F1.

Results

Random Forest achieved the highest AUC of 0.877 (F1=0.917, sensitivity = 96.8%, PPV=87.1%), followed closely by Gradient Boosting (AUC = 0.874, F1 = 0.918). Classical ML models consistently outperformed their deep learning counterparts. CNN-LSTM was the best-performing recurrent model (AUC = 0.777) but remained inferior to all classical approaches. LSTM-family models produced AUC values of 0.684−0.777, likely reflecting the cross-sectional tabular nature of the data, which does not naturally exploit sequential modelling.

Conclusions

Random Forest provides robust, high-sensitivity early prediction of SAP severity using routine admission data. The findings support the clinical utility of classical ML on structured medical data and highlight the limited added value of recurrent architectures for tabular single-encounter datasets. External prospective validation is required before clinical deployment.

Article activity feed