Considerations for evaluating the practical utility of machine learning in suicide risk estimation: the role of cost and equity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
A key vulnerability in modeling suicide death is a lack of precision and therefore estimates are thought as ultimately unhelpful to clinicians, even with more advanced or nuanced machine learning (ML) techniques. We sought to fill several conceptual gaps by assessing performance, focusing on the precision-recall tradeoff, across multiple techniques, and with ad hoc contextualization for sensitivity, cost-balance, and fairness. To identify robust, differential performances of a cross section of ML techniques on a suicide risk task, emphasizing overall AUPRC maximization and downstream effects on hypothetical decision support. A retrospective cohort was selected for patients receiving care or having died per the Office of the Medical Examiner (OCME), between 2017 and 2020 using the Maryland Suicide Datawarehouse (MSDW). AUPRC-optimized settings yielded cross-validated AUPRC significantly improved over logistic regressions, especially for XGBoost in both hospital discharge (AUPRC: 0.667; PPV: 0.941) and commercial claims records (AUPRC: 0.558; PPV: 0.857). F-Beta statistics revealed that when precision is preferred (e.g., 99.9 percentile), XGBoost are among the most efficient tools, while random forest and MLP are better when sensitivity is preferred (90 percentile or lower). No algorithmic bias was identified by age, sex or race, but significant changes in performance are noted with certain clinical characteristics. To our knowledge, this is the first use of an AUPRC-maxima optimization for ML tools with predicting suicide death. The utility of suicide risk models in clinical decision support is discussed as being tied to innate class imbalance challenges in model training, with recommendations being provided on how to better evaluate performance.