Advancing Predictive Modeling in Behavioral Health: A Comparative Evaluation of Gradient-Boosted Bootstrap Model, Random Forest, and Logistic Regression Approaches to Treatment Completion
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Treatment completion in substance use disorder (SUD) programs is a strong predictor of positive long-term health outcomes, yet many clients disengage prematurely. Traditional statistical models often struggle to account for the complex, non-linear factors influencing treatment retention. This evaluation compares the predictive performance and probability calibration of three models (Logistic Regression, Random Forest, and a novel Gradient-Boosted Bootstrap Model (GBBM)) to determine the most effective approach for forecasting treatment completion. Data from 1,158 adults enrolled in a Colorado-based SUD treatment program were analyzed. Models were trained using an 80/20 stratified split and evaluated using metrics such as accuracy, F1-score, recall, ROC AUC, and Brier score. A dual-level bootstrap framework was used to assess performance stability, with a special focus on correctly identifying treatment completers (Class 1), a clinically significant minority. GBBM outperformed both traditional and ensemble baselines in several areas, achieving the highest recall (0.855), strongest F1-score (0.837), and lowest Brier score (0.076). While logistic regression demonstrated more stable performance across resamples, it lagged slightly in sensitivity and calibration. Random Forest showed the weakest overall performance, especially in identifying treatment completers. The GBBM offered a favorable trade-off between predictive accuracy and probabilistic calibration, particularly in detecting clients who complete treatment. These findings emphasize the value of ensemble-based machine learning in behavioral health applications while emphasizing the continued relevance of interpretable models like logistic regression.