Predicting suicidal ideation from depression screening data: A network‑augmented machine learning approach
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Aims. Suicide is a critical public health concern, with suicidal ideation serving as a key precursor to suicidal behavior. However, during routine health screenings, individuals at risk do not always disclose suicidal thoughts, potentially leading to missed opportunities for early intervention. Models capable of identifying elevated suicide risk without relying on direct questions could help fill this gap and enhance prevention efforts. Depression is one of the most robust predictors of suicide risk, and routinely collected depressive symptom data provide a promising indirect pathway to detection. Large, population-based datasets enable the construction of normative symptom patterns, against which atypical configurations—potentially indicative of hidden risk—can be identified. Although machine learning is well suited for capturing complex, nonlinear relationships, most models lack transparency regarding how individual symptoms interact. To address this limitation, we integrated machine learning with individualized symptom network analysis to develop an interpretable and accurate approach for predicting suicidal ideation, even in the absence of direct suicide-related questions.Methods. We trained and tested logistic regression, random forest, and XGBoost models using the U.S. National Health and Nutrition Examination Survey (NHANES; N = 44,922 adults). Predictors included PHQ-9 items 1-8 and individualized symptom-network features (strength centrality, pairwise edge weights, and network density). Suicidal ideation was defined as PHQ‑9 item 9 ≥ 1. Model performance was evaluated using precision-recall area under the curve (PR AUC), precision, recall, and specificity. Feature importance was assessed with Boruta and SHapley Additive exPlanations (SHAP). Generalizability was tested across five independent cohorts (N = 808,023), including college students, medical interns, psychiatric outpatients, a bipolar research cohort, and a nationally representative Korean sample.Results. In NHANES, the network-augmented random forest model demonstrated the strongest performance (PR AUC = 0.90; precision = 0.99; recall = 0.90; specificity = 0.70). Key features included the centrality and severity of depressed mood (item 2), the centrality of guilt/worthlessness (item 6), overall network density, and edges linking anhedonia-fatigue (items 1 and 4) and sleep disturbance-fatigue (items 3 and 4). External validation yielded high precision (≥ 95%) and PR AUCs ranging from 0.66 to 0.91 across all cohorts.Conclusions. Integrating symptom-network features into machine learning models enhances interpretability without compromising predictive accuracy. This approach may facilitate earlier identification of individuals at risk for suicidal ideation in both large-scale screenings and public health programs, enabling more timely and targeted suicide prevention strategies.