Identifying Key Predictive Features for Opioid Use Disorder Using Machine Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Opioid Use Disorder (OUD) continues to pose a pressing public health challenge across the United States, highlighting the critical need for early and accurate risk assessment tools that facilitate prompt prevention and intervention efforts. Machine learning methods have emerged as valuable tools for parsing complex medical datasets and aiding in clinical decisions. However, their effectiveness and interpretability largely rely on the appropriateness and quality of selected input variables. In this work, we conducted a comprehensive comparison of three distinct feature selection strategies-Alternating Decision Tree (ADT)-based scoring, Cross-Validated Feature Evaluation (CVFE), and Hypergraph-Based Feature Evaluation (HFE)-to identify the most predictive indicators of OUD. The analysis was performed using data from the 2023 National Survey on Drug Use and Health (NSDUH), a dataset compiled by RTI International under the direction of the Substance Abuse and Mental Health Services Administration (SAMHSA). This dataset encompasses a broad spectrum of features related to demographics, behavior, mental health, and substance usage. Each feature selection method yielded a set of important predictors, which were subsequently used to train eXtreme Gradient Boosting (XGBoost) classification models. Their respective performances were evaluated and compared, with the model informed by CVFE-selected features achieving the best outcomes-demonstrating a predictive accuracy of 79.11% and an area under the receiver operating characteristic (ROC) curve of 0.8652. To enhance model transparency and interpretability, SHapley Additive exPlanations (SHAP) were employed to illustrate the influence of individual variables on model predictions. The findings highlight the crucial importance of effective feature selection in enhancing both model accuracy and interpretability, ultimately supporting the development of practical, data-driven approaches that may help healthcare providers assess OUD risk and tailor prevention strategies to individual needs.