Machine Learning-Based Prediction of Vitamin D and Calcium Deficiencies: A Retrospective Cross-Sectional Study in Nepal

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Vitamin D and serum calcium are essential for bone health and maintaining mineral homeostasis. However, nutrient deficiency remains a major public health problem in countries such as Nepal, and the situation is exacerbated with increasing age and seasonal variation. The objective of this study was to develop and validate machine learning models for the prediction of vitamin D and calcium levels on the basis of seasonal changes and demographic variables. A web-based platform was also designed for risk assessment and surveillance. Methods A total of 8,181 individuals were examined to scrutinize seasonal variation in vitamin D and serum calcium. The vitamin D status was classified according to the Endocrine Society guidelines. To predict serum calcium, five regression models—linear regression, random forest, XGBoost, multilayer perceptron (MLP), and gradient boosting—were used, whereas for vitamin D classification, random forest, XGBoost, MLP, gradient boosting, SVM, and ordinal logistic regression were employed. SHAP was used for feature importance, and DCA was used for clinical utility. Finally, a real-time Streamlit application was deployed on Hugging Face Spaces for public access. Results The study illustrated vitamin D and serum calcium levels among 8,181 participants, revealing significant seasonal variations. The peak vitamin D content was detected in the summer, whereas vitamin D deficiency increased during the winter. A strong positive correlation was found between vitamin D and its categorical classification (ρ = 0.92, p < 0.001), whereas a moderate correlation was observed between vitamin D and serum calcium levels (ρ = 0.30, p < 0.001). Among the different machine learning models, gradient boosting appeared to be the best predictor of serum calcium, with the highest R² (0.2078) and the lowest MAE (0.4169), whereas the MLP neural network performed the worst. Similarly, for vitamin D classification, gradient boosting and XGBoost are important methods with respect to accuracy (~99.7%) and AUC (~0.999). SHAP analysis confirmed vitamin D levels and seasonal variation as key predictors. A web-based prediction tool was developed via random forest, allowing real-time vitamin D and calcium predictions, with clinical recommendations, deployed on Hugging Face Spaces. Conclusion This study developed and validated machine learning models using gradient boosting and XGBoost to predict vitamin D and serum calcium levels in the Nepalese population. These findings also emphasize the seasonal and age-related trends in vitamin D and serum calcium levels. Early diagnosis and preventative care can be obtained in low-resource settings from web-based applications. Furthermore, time series data and other biomarkers should be used in future studies to increase prediction accuracy.

Article activity feed