Interpretable Machine Learning Models for Childhood and Adolescent Obesity Prediction According to Chinese and WHO Standards: A Two-Wave Cross-Sectional Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective: This study aimed to develop and validate interpretable machine learning models to predict childhood and adolescent obesity using multi-domain risk factors, and to deploy these models into an accessible online tool for clinical and public health use. Methods: Data were derived from two waves of cross-sectional surveys (2022 and 2024) conducted in Pinggu District, Beijing, involving 22,555 children and adolescents aged 3–18 years. Obesity was defined according to both Chinese and World Health Organization (WHO) standards. Thirty-eight features across five domains (demographic, fetal life, lifestyle, health status, and family factors) were analyzed. Feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression. Four machine learning models—K-nearest neighbors, LightGBM, neural network, and XGBoost—were trained and evaluated using a comprehensive set of 28 performance metrics. Model interpretability was enhanced using SHapley Additive exPlanations (SHAP). The best-performing models were deployed as web-based applications. Results: Five and twelve features were selected for predicting obesity under Chinese and WHO standards, respectively. Age, maternal BMI, paternal BMI, screen time, and birth weight were consistently important across both standards. The neural network model performed best under Chinese standards (AUC = 0.7352), while XGBoost performed best under WHO standards (AUC = 0.7358). SHAP analysis provided global and local interpretations of feature contributions. User-friendly online prediction tools were developed and made publicly available. Conclusion: This study developed interpretable machine learning models that effectively predict childhood and adolescent obesity using a minimal set of clinically relevant features. The deployed tools facilitate individualized risk assessment and may support targeted prevention strategies.