Development and validation of a multivariable Prediction Model for Pre-diabetes and Diabetes using Easily Obtainable Clinical Data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Importance
In the US, pre-diabetes and diabetes are increasing in prevalence alongside other chronic diseases. Hemoglobin A1c is the most common diagnostic test for diabetes performed in the US, but it has known inaccuracies in the setting of other chronic diseases.
Objective
To determine if easily obtained clinical data could be used to improve the diagnosis of pre-diabetes and diabetes compared to hemoglobin A1c alone.
Design, Setting, and Participants
This cross-sectional study analyzed nationally representative data obtained from six 2-year cycles (2005 to 2006 through 2015 to 2016) of the National Health and Nutrition Examination Survey in the US. We excluded participants without hemoglobin A1c, oral glucose tolerance test, or sample weight data. The sample comprised 13,800 survey participants. Data analyses were performed from May 1, 2024 to February 9, 2025.
Main Outcomes and Measures
We estimated 2-hour glucose from a gradient boosted machine decision tree machine learning model to diagnose pre-diabetes and diabetes as defined by oral glucose tolerance test 2-hour glucose of greater than or equal to 140 mg/dL but less than 200 mg/dL and greater than or equal to 200 mg/dL, respectively. We compared the area-under-the-receiver-operating-curve (AUROC), the calibration, positive predictive value, and the net benefit by decision curve analysis to hemoglobin A1C alone.
Results
A 20-feature Model outperformed the hemoglobin A1c and fasting plasma glucose for diagnosis, with AUROC improvement from 0.66/0.71 to 0.77 for pre-diabetes and from 0.87/0.88 to 0.91 for diabetes. The Model also had improved positive predictive value compared to the A1c for diagnosis and for net benefit on decision curve analysis. Main features that improved diagnosis of pre-diabetes and diabetes were the standard vitals: age, height, weight, waist circumference, blood pressure, pulse, the fasting labs plasma glucose, insulin, triglycerides, and iron, the non-fasting labs cholesterol, gamma-glutamyl transferase, creatinine, platelet count, segmented neutrophil percentage, urine albumin, and urine creatinine, and the social determinant of health factor Poverty Ratio.
Conclusions and Relevance
In this cross-sectional study of NHANES participants, we identified risk factors that could be incorporated into the electronic medical record to identify patients with potentially undiagnosed pre-diabetes and diabetes. Implementation could improve diagnosis and lead to earlier intervention on disease before it becomes severe and complications develop.
Key Points
Question
Can readily-available clinical data improve diagnosis of pre-diabetes and diabetes compared to hemoglobin A1c testing alone?
Findings
In this cross-sectional study of 13,800 adults with paired hemoglobin A1c and oral glucose tolerance testing in the National Health and Nutrition Examination Survey, the rate of pre-diabetes undiagnosed by 8.6% and rate of diabetes undiagnosed by the hemoglobin A1c was 3.5%. A novel multivariable prediction model that included fasting plasma glucose, insulin, basic body measurements, and routinely available dyslipidemia and hepatic function labs for was significantly more accurate (AUROC 0.66/0.71 to 0.77 for pre-diabetes, 0.87/0.88 to 0.91 for diabetes) than hemoglobin A1C or fasting plasma glucose alone.
Meaning
Incorporation of easily obtainable clinical data can improve diagnosis of pre-diabetes and diabetes compared to hemoglobin A1C alone.