Machine Learning-Based Prediction of Depression Risk in Patients with Multimorbidity: A Study Using NHANES Data from 2009-2016
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
OBJECTIVE Patients with multimorbidity are at a significantly higher risk of developing depression compared to those without multimorbidity, which can severely impair psychosocial functioning and decrease quality of life. However, no effective method currently exists to assess depression risk in these patients. In this study, we leveraged data from the National Health and Nutrition Examination Survey (NHANES) database and applied machine learning (ML) techniques to construct a depression risk prediction model, aiming to predict the likelihood of depression in patients with multimorbidity. METHODS Data from the 2009–2016 NHANES cycles were used, incorporating a wide range of demographic and clinical variables in the analysis. Various machine learning algorithms were evaluated for their predictive performance, and the CatBoost classifier, which demonstrated the best performance, was selected to build the prediction model. The model’s predictions were interpreted using SHapley Additive exPlanations (SHAP) values, which ranked the importance of features and visualized the contribution of variables, such as socioeconomic status, biological factors, and lifestyle habits, in predicting depression among multimorbid patients. RESULTS The predictive performance of the model was robust for both overall depression and the binary classification of moderate to severe depression. For moderate to severe depression classification, the model achieved an accuracy of 0.9541, an AUC of 0.9903, and an F1 score of 0.954. The accuracy and recall were 0.9475 and 0.9607, respectively, with a kappa value of 0.9081 and an MCC of 0.9083. SHAP plots revealed that age and socioeconomic status were among the most important predictors for both depression and moderate to severe depression classifications. CONCLUSION This study developed a machine learning model to predict depression risk in patients with multimorbidity using NHANES data. The model demonstrated excellent predictive performance, and SHAP plots highlighted key predictors such as age and socioeconomic status, which significantly influenced the prediction of depression and its severity.