Development and Validation of a Machine Learning-Based Risk Prediction Model for Low Back Pain in Middle-Aged and Elderly Chinese: A SHAP-Interpretable Longitudinal Cohort Study
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Low back pain is one of the most common diseases among middle-aged and elderly people in China. Therefore, it is necessary to explore a risk prediction model for low back pain in middle-aged and elderly individuals. Methods This study included participants aged 45 years and older from the CHARLS database. A combination of multivariable logistic regression analysis and Lasso regression was used to select key feature variables. Seven machine learning (ML) algorithms were employed to construct predictive models for low back pain, and SHapley Additive exPlanations (SHAP) was applied to interpret the models. Results A total of 1,904 middle-aged and elderly participants aged 45 years and older were included in this study, among whom 351 developed low back pain at the endpoint. Nine variables were identified as predictors in the low back pain prediction model. All seven ML models demonstrated considerable predictive performance, with the Random Forest (RF) and K-Nearest Neighbors (KNN) models showing the best performance (AUC = 0.997), while the XGBoost model also exhibited good predictive capability (AUC = 0.959). SHAP analysis indicated that, in the RF model, the top three most important features were the CES-D score, self-rated health status, and walking speed. In the XGBoost model, walking speed, grip strength, and self-rated health status contributed the most to the predictions. Conclusion All seven ML models performed well, and SHAP analysis revealed that the CES-D score, walking speed, grip strength, and Self-perceived health status were key contributing factors in the models.