Comparative Performance of Linear Regression and Machine Learning Models for Predicting Glycemic Status in Uncontrolled Type 2 Diabetes: SHAP-Based Analysis

Juhaina Salim Al-Maqbali
Ibrahim Al-Zakwani
Abdullah M. Al Alawi
Mohammed Al Za’abi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background This study aimed to compare the predictive performance of linear regression (LR) versus other machine learning (ML) models and assess the importance of clinical, biochemical and medication adherence predictors using SHapley Additive exPlanations (SHAP) analysis. Methods A cross-sectional study was conducted among adults (≥ 18 years) with type 2 diabetes mellitus (T2DM) and uncontrolled glycated hemoglobin (HbA1c) (≥ 7%), which was the primary outcome. After data preprocessing and feature selection, four supervised regression models; LR, random forest (RF), support vector regression (SVR), and extreme gradient boosting (XGBoost), were trained and evaluated. ANOVA F-test identified the top predictive continuous variables and SHAP analysis was used for clinical interpretation. Results Data from 223 patients were analyzed (mean age: 57.4 ± 9.8 years; 50.7% female). LR achieved the highest coefficient of determination (R²=0.28), while RF had the lowest mean absolute error (MAE = 1.18). SVR and XGBoost underperformed, with R² values of 0.19 and 0.07, respectively. Key predictors for high HbA1c included; fasting blood glucose (FBG), diastolic blood pressure (DBP), body mass index (BMI), insulin dose, serum magnesium concentration, and medication adherence. SHAP analysis confirmed the influence of DBP, FBG, insulin dose, magnesium levels, and low adherence on elevated HbA1c. Conclusion Although RF model moderately predicted HbA1c, LR outperformed the other ML-models. SHAP analysis highlighted interpretable predictors, supporting the use of explainable ML models for personalized glycemic risk stratification and clinical decision-making in T2DM management. Future studies should consider larger, multi-center datasets with more features and external validation to enhance ML-models’ predication accuracy and generalizability.

Version published to 10.21203/rs.3.rs-7292468/v1 on Research Square
Sep 11, 2025

Development and Validation of a Prediction Model for Microvascular Complications of Type 2 Diabetes Based on Inflammation-Metabolism Composite Indicators

This article has 2 authors:
1. Title：LI Yuting
2. minawaer HUJIAAIHEMAITI
This article has no evaluationsLatest version Jan 6, 2026
Development and Validation of a Machine Learning-Based Risk Prediction Model for Ischemic Stroke-Diabetes Comorbidity

This article has 2 authors:
1. Litian Hu
2. Hongyu Sun
This article has no evaluationsLatest version Dec 23, 2025
Improvement of Risk Stratification for Diabetic Retinopathy Progression in Primary Care via an AI Model with Dynamic Anthropometric Data

This article has 4 authors:
1. bing liu
2. Jinchun Lin
3. Jianjun Zhang
4. hailong li
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Development and Validation of a Prediction Model for Microvascular Complications of Type 2 Diabetes Based on Inflammation-Metabolism Composite Indicators

Development and Validation of a Machine Learning-Based Risk Prediction Model for Ischemic Stroke-Diabetes Comorbidity

Improvement of Risk Stratification for Diabetic Retinopathy Progression in Primary Care via an AI Model with Dynamic Anthropometric Data