Identifying High-Risk Adolescents for Mental Health Difficulties: A Machine Learning Analysis of the Health Behaviour in School-aged Children Study Across 46 Countries

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Adolescent mental health represents a global public health crisis, yet traditional surveillance methods lack the scalability and predictive power needed for effective early identification. This study aimed to develop and validate a machine learning framework to predict multiple psychosomatic health complaints (MPHC), a key indicator of mental distress, using large-scale, multi-national population survey data. Methods: Data were drawn from the 2017-2018 Health Behaviour in School-aged Children (HBSC) survey, comprising an analytical sample of 225,421 adolescents aged 11, 13, and 15 years from 46 countries. The primary outcome was daily MPHC. The dataset was partitioned for training (80%) and testing (20%), with eight distinct machine learning algorithms developed and evaluated. Analyses were stratified by sex and school grade. Shapley Additive Explanations (SHAP) analysis was applied to the optimal models to identify the most important predictors of mental health risk. Results: Machine learning models demonstrated robust discriminatory performance for identifying adolescents with daily MPHC, with area under the receiver operating characteristic curve (ROC) values ranging from 0.76 to 0.79 across subgroups. Model performance peaked for girls in Grade 7 (AUC = 0.79; 95% CI: 0.78-0.80). SHAP analysis revealed that modifiable psychosocial factors were the most powerful predictors. High academic pressure, problematic social media use, and low family support consistently emerged as the top predictors across all subgroups. The analysis uncovered distinct, sex-differentiated risk architectures: for boys, physical fighting was a uniquely persistent and high-impact predictor across all grades, while for girls, difficulties with parental communication and academic pressure were particularly salient, especially during mid-adolescence. Conclusions: Machine learning applied to standardized population survey data offers a scalable, accurate paradigm for predictive public health screening of adolescents at mental health risk. The findings challenge a "one-size-fits-all" approach, providing a data-driven mandate for the design of developmentally-timed and sex-specific interventions that target the distinct psychosocial risk factors shaping adolescent well-being.

Article activity feed