Predicting Adolescent Suicidal Tendency in Chinese Secondary School Students: A Machine Learning Approach with XGBoost and SHAP Interpretation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background : Adolescent suicide is a critical public health issue globally. Early detection of suicidal tendency remains challenging due to its concealed and multidimensional nature. This study aimed to develop and validate an interpretable machine learning model to predict suicidal tendency among Chinese secondary school students. Methods : A cross-sectional survey was conducted among 12,063 students from Suzhou, China. A total of 23 variables, including demographic, psychological, and behavioral factors, were collected. Seven machine learning models (LR+LASSO, LightGBM, SVM, KNN, DT, RF, and XGBoost) were developed and compared using 5-fold cross-validation. Model performance was evaluated using AUC, sensitivity, specificity, calibration curves, and decision curve analysis. Feature importance was interpreted using SHAP values. Results : Among the participants, 21.98% exhibited suicidal tendency. XGBoost outperformed other models on the validation set, achieving an AUC of 0.802 (95% CI: 0.785–0.818), sensitivity of 0.686, specificity of 0.758, and a negative predictive value of 0.892. The top three predictors were depressed mood (PHQ2), self-dissatisfaction (PHQ6), and reluctance to seek help. SHAP analysis revealed that male students with high distress and low help-seeking intent constituted a high-risk subgroup. Conclusion : The XGBoost-based model demonstrates strong predictive ability and clinical interpretability for identifying adolescents at risk of suicide. It highlights the importance of integrating psychological and behavioral factors in school-based screening programs, particularly for under-recognized subgroups such as distressed males who avoid seeking help.