Risk factors for osteoporotic fractures in postmenopausal women: Evidence from the China Health and Nutrition Survey
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective To develop and validate a machine learning model that integrates indicators of educational level and nutritional intake to predict fractures associated with postmenopausal osteoporosis, while also clarifying the roles these factors play in disease prediction. Methods Data were sourced from the China Health and Nutrition Survey, with a focus on important aspects such as nutritional intake and educational levels. To improve the model's accuracy, additional factors like physical body shape indicators, blood biochemical markers, and pain conditions were also included. To simplify the data and uncover underlying patterns, principal component analysis (PCA) was applied to a dataset that included various variables. The models constructed for analysis comprised decision trees, k-nearest neighbors (KNN), logistic regression, Gaussian naive Bayes, random forests, and support vector machines (SVM). To prevent overfitting, a ten-fold cross-validation method was utilized to systematically evaluate and compare the performance of these models. Furthermore, SHapley Additive exPlanation (SHAP) values were calculated to assess the predictive contribution of each feature in the model that performed the best. Results This analysis involved 1,157 participants, among whom 558 experienced fractures related to postmenopausal osteoporosis. Following a principal component analysis, a machine learning model was employed to evaluate five key features. The random forest classifier achieved the highest accuracy, recorded at 0.6695, along with the best area under the receiver operating characteristic curve, which was 0.6852. Additionally, the random forest model showed balanced sensitivity and specificity, both nearing 68%. Furthermore, SHAP analysis revealed that educational level and nutritional intake indicators were the most significant factors influencing the outcomes. Conclusion The random forest model proved to be the most effective tool for predicting the risk of fractures related to postmenopausal osteoporosis. The analysis using SHAP values underscored the significance of educational level and nutritional intake as key factors influencing the model's predictions.