Identifying Key Predictive Features for Opioid Use Disorder Using Machine Learning

Suraiya Akhter
John H. Miller

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Opioid Use Disorder (OUD) continues to pose a pressing public health challenge across the United States, highlighting the critical need for early and accurate risk assessment tools that facilitate prompt prevention and intervention efforts. Machine learning methods have emerged as valuable tools for parsing complex medical datasets and aiding in clinical decisions. However, their effectiveness and interpretability largely rely on the appropriateness and quality of selected input features. Objective In this work, we conducted a comprehensive comparison of three distinct feature selection strategies-Alternating Decision Tree (ADT)-based scoring, Cross-Validated Feature Evaluation (CVFE), and Hypergraph-Based Feature Evaluation (HFE)-to identify the most predictive indicators of OUD. Methods The analysis was performed using data from the 2023 National Survey on Drug Use and Health (NSDUH), a dataset compiled by RTI International under the direction of the Substance Abuse and Mental Health Services Administration (SAMHSA). This dataset encompasses a broad spectrum of features related to demographics, behavior, mental health, and substance usage. Each feature selection method yielded a set of important predictors, which were subsequently used to train eXtreme Gradient Boosting (XGBoost) classification models. To enhance model transparency and interpretability, SHapley Additive exPlanations (SHAP) was employed to illustrate the influence of individual variables on model predictions. Results The performance of the models was evaluated and compared, with the model informed by CVFE-selected features achieving the best outcomes-demonstrating a predictive accuracy of 79.11% and an area under the curve (AUC) of 0.8652. The top 10 most influential features, based on SHAP value rankings from the best-performing model, included past-year misuse of pain relievers, recent alcohol use disorder, age group, history of asthma, receipt of substance use treatment in the past year, educational attainment, household size, total household income, marital status, and race/ethnicity. The web application, accessible via https://shiny.tricities.wsu.edu/oud-prediction/, offers prediction outcomes, probability metrics, and a SHAP visualization generated from the best model built using cross-validation-based approach. Conclusions The findings highlight the crucial importance of effective feature selection in enhancing both model accuracy and interpretability, ultimately supporting the development of practical, data-driven approaches that may help healthcare providers assess OUD risk and tailor prevention strategies to individual needs. Trial registration Not applicable as this research is not a clinical trial.

Version published to 10.1101/2025.07.12.25331446 on medRxiv
Jul 15, 2025

Tools for Helping Identify Behavior Disorders: Comparing Bayesian Evidence-Based and Machine Learning Approaches

This article has 7 authors:
1. Yinuo Liu
2. Eric Arden Youngstrom
3. Caroline Bodary
4. Zhuoyu Shi
5. Jennifer Youngstrom
6. Ekaterina Stepanova
7. Robert L. Findling
This article has no evaluationsLatest version Dec 12, 2025
Concise Comprehensive Assessment of Psychiatric Disorder Risks Using Machine Learning

This article has 10 authors:
1. Yuan Hong Sun
2. Jianli Zhu
3. Fanqiang Meng
4. Qijian Liu
5. Queenny Chiu
6. Nathan Y. Lee
7. Junbang Zhao
8. Xinpeng Xu
9. Xiaohong Li
10. Kang Lee
This article has no evaluationsLatest version Jan 23, 2026
Unravelling the Complexity Gap: A Mechanistic Investigation of Machine Learning Classification in Panic Disorder

This article has 2 authors:
1. Filipe Ricardo Carvalho
2. Ana Teresa Martins
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Tools for Helping Identify Behavior Disorders: Comparing Bayesian Evidence-Based and Machine Learning Approaches

Concise Comprehensive Assessment of Psychiatric Disorder Risks Using Machine Learning

Unravelling the Complexity Gap: A Mechanistic Investigation of Machine Learning Classification in Panic Disorder