Machine-Learning-Based Prediction and Interpretation of Non-Erosive Reflux Disease Risk

Chunrou LONG
Haiyang HUA
Yuan LI
Xiaoxue ZHANG
Jianhui LI
Xin HAO

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective To develop a machine learning (ML) model for predicting non-erosive reflux disease (NERD) risk, interpret the optimal model using Shapley Additive Explanations (SHAP), and create an online prediction tool. Methods This single-center retrospective cohort study enrolled 556 patients undergoing sedated gastroscopy at Chengde Central Hospital (June 1, 2024–June 1, 2025). Stratified random sampling allocated participants to training (n = 390) and validation (n = 166) sets (7:3 ratio). Clinical characteristics were analyzed using LASSO regression with 10-fold cross-validation to identify predictors. Nine ML models were developed and compared: elastic net GLM, random forest, support vector machine, gradient boosting machine, XGBoost, artificial neural network, K-nearest neighbors, linear discriminant analysis, and elastic net regression. Performance was evaluated by F1-score, AUC, Brier score, recall, precision, and accuracy. Bootstrap resampling (1000 iterations) and calibration curves compared predictive efficacy, with the optimal model selected by highest calibrated AUC. Decision curve analysis (DCA) quantified clinical utility. SHAP interpreted the optimal model (via bar/summary plots), and an online calculator was deployed. Results LASSO identified five predictors: Dilation of capillary loops in the epithelial papillae of the arytenoid cartilage, waistline, non-exposed cardia glands, cardia polyps, and Hill grade III/IV gastro-oesophageal flap valve (GEFV). All models achieved AUCs > 0.770 in training and validation sets. After internal validation, random forest demonstrated optimal performance (validation set calibrated AUC: 0.805, 95% CI: 0.741–0.866). Brier scores were 0.178 (training) and 0.227 (validation). DCA confirmed net clinical benefit across 0.01–0.99 threshold probabilities. SHAP analysis ranked predictor contributions: Dilation of capillary loops in the epithelial papillae of the arytenoid cartilage, Waistline, non-exposed cardia glands, cardia polyps, Hill grade III/IV GEFV, all positively associated with NERD risk. The online calculator was validated locally. Conclusion : Five key NERD predictors were identified. The SHAP-interpretable random forest model demonstrates robust performance and clinical utility. The deployed calculator may enable early prevention, personalized management, and targeted interventions for NERD.

Version published to 10.21203/rs.3.rs-7309579/v1 on Research Square
Sep 22, 2025

Machine Learning-Based Risk Prediction Model for Fatigue in Chronic Heart Failure Patients

This article has 9 authors:
1. Min Zhou
2. Jingran Yang
3. Yimei Zhang
4. Yu Wang
5. Ruijie Yanglan
6. Qinlan Li
7. Yangjuan Bai
8. Wei Wei
9. Fang Ma
This article has no evaluationsLatest version Jan 27, 2026
Construction of Predictive Models for Interstitial Lung Disease Risk in Sjögren’s Syndrome via Multiple Machine Learning Algorithms

This article has 3 authors:
1. qian hui li
2. xinyu sun
3. yueyue chen
This article has no evaluationsLatest version Feb 3, 2026
Development and Validation of a Machine Learning-Based Risk Prediction Model for Ischemic Stroke-Diabetes Comorbidity

This article has 2 authors:
1. Litian Hu
2. Hongyu Sun
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Machine Learning-Based Risk Prediction Model for Fatigue in Chronic Heart Failure Patients

Construction of Predictive Models for Interstitial Lung Disease Risk in Sjögren’s Syndrome via Multiple Machine Learning Algorithms

Development and Validation of a Machine Learning-Based Risk Prediction Model for Ischemic Stroke-Diabetes Comorbidity