Using Machine Learning to Investigate Predictors of Fasting Blood Glucose: Insights into Circadian Timing and Age Interactions

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Metabolic dysfunction, particularly impaired glucose regulation, is a major contributor to chronic diseases such as type 2 diabetes (T2D). While established risk factors are well characterized, the contribution of circadian-related factors remains underexplored. This study aimed to develop an explainable machine learning model to predict log-transformed fasting blood glucose (FBG) by integrating metabolic, hormonal, lifestyle, demographic, and circadian variables from the National Health and Nutrition Examination Survey (NHANES) 2017--2020. After merging multiple NHANES sub-datasets, data were processed using a leakage-resistant pipeline in which imputation, scaling, and one-hot encoding were performed only after dataset splitting and within training folds during model development. Elastic Net and XGBoost models were trained using a feature set of approximately 90 variables, including engineered circadian interaction terms, and performance was evaluated using mean absolute error (MAE), coefficient of determination (\((R^2)\)), and SHapley Additive exPlanations (SHAP). The optimized XGBoost model achieved strong predictive performance (MAE = 0.0808; test-set \((R^2)\) = 0.7748) using ten key predictors, including glycohemoglobin, insulin, age, gender, gamma-glutamyl transferase (GGT), race, and a Sleep Midpoint \((\times)\) Age interaction. SHAP analysis identified glycohemoglobin as the dominant predictor, with the Sleep Midpoint \((\times)\) Age interaction consistently ranking among the top contributors, suggesting a stronger association between sleep timing and glucose regulation in older adults. These findings support the contribution of circadian measures to FBG prediction and provide insight into the multifactorial determinants of metabolic health, with potential relevance for personalized risk assessment.

Article activity feed