Leveraging Pretrained Large Language Model for Prognosis of Type 2 Diabetes Using Longitudinal Medical Records

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Timely prognosis of type 2 diabetes (T2D) complications is critical for effective interventions and reducing economic burden. AI-driven large language models (LLMs) offer potential for extracting clinical insights but face challenges due to the sparse, high-dimensional nature of longitudinal medical records. This study demonstrates the utility of LLMs in medical time series prediction by preprocessing data with a missing mask, adding an embedding layer to a pretrained LLM, and fine-tuning both components. The fine-tuned model outperformed baselines in predicting both HbA1c and LDL levels using the DPV registry dataset of 449,185 T2D patients, achieving Pearson’s correlations of 0.749 and 0.754, with a delta improvement of 0.253 and 0.259, respectively. The model also demonstrated robust long-term prediction for HbA1c over 554.3 days (95% CI: [547.0, 561.5]), with a 9% improvement in MSE over last-observation-based methods. Integrated gradient analysis identified significant clinical features and visits, revealing potential biomarkers for early intervention. Overall, the results showed the possibility to leverage the prediction power of LLM in T2D prognosis using sparse medical time series, assisting clinical prognosis and biomarker discovery, ultimately advancing precision medicine.

Article activity feed