Comparative Evaluation of Machine Learning and Deep Learning Models for Early Prediction of Severe Acute Pancreatitis: A Multi-Model Study Using the 2012 Revised Atlanta Classification

Netanel Stern

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Acute pancreatitis (AP) is a common gastrointestinal emergency with a subset of patients progressing to severe acute pancreatitis (SAP), which carries substantial morbidity and mortality. Current clinical severity scores such as BISAP, APACHE II, Ranson, and the Modified CT Severity Index require 24-48 hours of observation before reliable assessment is possible, limiting early triage. Machine learning (ML) approaches using routine admission laboratory values may enable earlier, more accurate prediction.

Methods

We evaluated 11 models spanning three architectural families—classical ML (Logistic Regression, Random Forest, Gradient Boosting), feedforward deep learning (MLP, Residual MLP, Attention MLP), and recurrent deep learning (LSTM, Stacked LSTM, Bidirectional LSTM, LSTM+Attention, CNN-LSTM) —on a Chinese AP cohort of 722 patients (585 severe, 137 mild) labelled according to the 2012 Revised Atlanta Classification. Performance was assessed via 5-fold stratified cross-validation using AUC-ROC, F1 score, sensitivity, specificity, and positive predictive value (PPV), with decision thresholds optimised for maximal F1.

Results

Random Forest achieved the highest AUC of 0.877 (F1=0.917, sensitivity = 96.8%, PPV=87.1%), followed closely by Gradient Boosting (AUC = 0.874, F1 = 0.918). Classical ML models consistently outperformed their deep learning counterparts. CNN-LSTM was the best-performing recurrent model (AUC = 0.777) but remained inferior to all classical approaches. LSTM-family models produced AUC values of 0.684−0.777, likely reflecting the cross-sectional tabular nature of the data, which does not naturally exploit sequential modelling.

Conclusions

Random Forest provides robust, high-sensitivity early prediction of SAP severity using routine admission data. The findings support the clinical utility of classical ML on structured medical data and highlight the limited added value of recurrent architectures for tabular single-encounter datasets. External prospective validation is required before clinical deployment.

Version published to 10.64898/2026.06.20.26356146 on medRxiv
Jun 23, 2026

Temporal Feature Engineering and Ensemble Learning for Predicting 28-Day Mortality in ICU Patients with Alcoholic Cirrhosis

This article has 7 authors:
1. Janet Sanjaya
2. Mohammadsaeed Haghi
3. Nausin Kudrot
4. Sakshie Pathak
5. Shreyas V. Chandramouli
6. Kamiar Alaei
7. Maryam Pishgar
This article has no evaluationsLatest version Jul 2, 2026
Prediction of post-operative delirium with machine learning in abdominal surgery with comorbidity indices and laboratory values

This article has 4 authors:
1. Wesley Chorney
2. Shinhee Kang
3. Sing Hui Ling
4. Michael Lisi
This article has no evaluationsLatest version Jul 1, 2026
RenalTransLSTM: Multi-Horizon Prediction of Acute Kidney Injury in ICU Patients using a Hybrid LSTM-Transformer Architecture

This article has 7 authors:
1. S. M. Saiful Islam Badhon
2. Mohammad Adibuzzaman
3. Abu Saleh Mohammad Mosa
4. Serdar Bozdag
5. Ana D. Cleveland
6. Junhua Ding
7. K S M Tozammel Hossain
This article has no evaluationsLatest version Jul 6, 2026

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

Temporal Feature Engineering and Ensemble Learning for Predicting 28-Day Mortality in ICU Patients with Alcoholic Cirrhosis

Prediction of post-operative delirium with machine learning in abdominal surgery with comorbidity indices and laboratory values

RenalTransLSTM: Multi-Horizon Prediction of Acute Kidney Injury in ICU Patients using a Hybrid LSTM-Transformer Architecture