Deep Learning for Blood Glucose Prediction: Reproducibility Challenges and Factors Affecting Differential Performance
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Blood glucose prediction is a fundamental part of advanced technology that promises to improve diabetes outcomes. However, a critical gap exists around understanding the reproducibility of state-of-the-art methods for blood glucose prediction. In this study, we curated 60 deep learning (DL)-based glucose prediction papers published between 2018–2025 and assessed them against seven established reproducibility criteria. We found that code availability, overreliance on a single public dataset, and limited use of multiple datasets for algorithm development and evaluation are amongst the top challenges to reproducibility. Next, we replicated six representative models from well-cited prior literature using 4602 days of data with over 1.25 million glucose samples from 128 persons with type 1 diabetes across three public datasets - OhioT1DM, DiaTrend, and T1DEXI. Our results show good reproducibility of DL methods when using the same code (where available) and same evaluation dataset. However, we found poor conceptual reproducibility across datasets with significantly different diabetes management. Further analyses revealed that the accuracy of blood glucose prediction methods was significantly associated with individual diabetes management and sex/gender. All models had significantly higher prediction errors for individuals with worse glycemic control and for female subgroups compared to males. To accelerate development of robust and equitable algorithms for diabetes management, we conclude with recommendations for future researchers centered considerations for data selection, model design and selection, model evaluation and reporting results, documentation and code release.