Toward Interpretable Glucose Forecasting for Type 2 Diabetes: A Comparative Study among Traditional, Deep, and Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Type 2 diabetes mellitus (T2DM) is a prevalent chronic condition characterized by elevated blood glucose levels resulting from insulin resistance or inadequate insulin secretion. Accurate prediction of future glucose levels is essential for minimizing complications and enabling proactive management. While machine learning and deep learning models have been extensively applied in this domain, the potential of large language models (LLMs) remains underexplored, with no prior studies systematically comparing them to conventional approaches using real patient data. In this study, we evaluate three model types: traditional (XGBoost, Random Forest), deep learning (GRU, LSTM, Transformer, Ensemble), and finetuned LLMs (GPT-4.1, MiniGPT, LLaMA-1B, LLaMA-7B), for predicting glucose levels 30, 60, and 90 minutes ahead using hybrid inputs of six static features and 20 prior CGM readings. GPT-4.1 achieved the best performance at 30 and 60 minutes, while LLaMA-7B excelled at 90 minutes. Among conventional models, LSTM showed the best performance. Beyond forecasting, interpretability was a central focus. We used explainable AI (XAI) techniques to interpret LSTM results, while GPT-4.1 explained its predictions directly using natural language, without additional training. Notably, the study revealed an alignment between the two explanation techniques, with both highlighting recent glucose readings as key predictors across all forecasting horizons.