Factors Influencing Vitamin D Status in Guiyang, China: A Random Forest and SHAP Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
To assess serum 25-hydroxyvitamin D [25(OH)D] levels in a health examination population in Guiyang, a low-latitude, high-altitude, and cloudy city in southwestern China, and to identify key determinants using machine learning.
Methods
This retrospective study included 10,931 adults (>20 years) who underwent health checkups at Guiyang First People’s Hospital between February 2019 and April 2025. Beyond conventional statistical comparisons, a two-stage machine learning approach was applied: LASSO regression for feature selection, followed by an optimized Random Forest regression model (mtry = 2). SHapley Additive exPlanations (SHAP) were used to quantify variable importance.
Results
The median serum 25(OH)D level was 36.63 (IQR 24.77,53.17) nmol/L. Vitamin D deficiency (<50 nmol/L) was present in 70.98% of participants, while sufficiency (>75 nmol/L) was only 7.35%. Significantly lower levels were observed in females, in adults aged <30 years (deficiency rate 85.6%), and during spring. The optimized Random Forest model achieved a cross-validated RMSE of 21.427. SHAP analysis revealed a clear hierarchy of importance: age (mean SHAP = 5.604) > season (4.104) > sex (1.533) ≈ BMI (1.501).
Conclusion
Vitamin D deficiency is highly prevalent in the Guiyang health examination population. Age and season are the dominant determinants, far outweighing sex and BMI. Targeted interventions should focus on young adults, females, and the spring season, especially in regions with similar cloudy highland climates.