Constructing Machine Learning-Based Risk Prediction Model for Osteoarthritis in Population Aged 45 and Above (NHANES 2011-2018)
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Osteoarthritis is a widespread chronic joint disease, becoming increasingly common in prevalence among individuals over the age of 45. This condition not only leads to joint pain and dysfunction but also significantly disrupts the patients’ daily life. Therefore, the objective of this study is to develop an interpretable machine learning model for predicting the risk of osteoarthritis in individuals aged 45 and above. Methods: This study utilized data from the National Health and Nutrition Examination Survey(NHANES) from 2011 to 2018, including a total of 2980 individuals. The dataset was randomly divided into a training set (n=2235) and a validation set (n=745). Five machine learning algorithms were employed to develop the predictive model for osteoarthritis. The SHapley Additive exPlanation (SHAP) method was used to interpret the ML algorithms and identify the factors that made the most significant contribution to the prediction outcomes. Results: A total of 2980 individuals were included, with an average age of 60 years, of whom 605 were diagnosed with osteoarthritis. Twenty-four variables were included in the modeling, and five machine learning algorithms were used to predict the risk of osteoarthritis. After feature selection using Recursive Feature Elimination(RFE), the CatBoost model with 20 variables showed the best prediction performance. The most influential predictors were Gender, Age, BMI, Waist circumference, and Race. Conclusion: This study demonstrates that the CatBoost model with 20 variables can effectively predict the occurrence of osteoarthritis.