NSCH-Flourishing-ML: A Curated Dataset and Reproducible Pipeline for Machine Learning Analysis of Child Flourishing

Miguel Arcos-Argudo
Rodolfo Bojorque
Fernando Pesántez
Kely Nieto-Andrade

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large-scale population surveys provide valuable information for studying child well-being, yet their structure often limits direct application of machine learning methods. The National Survey of Children’s Health (NSCH) is one of the most comprehensive datasets for monitoring children’s health and development in the United States, but the raw survey files contain skip patterns, categorical variables, and complex survey design elements that require substantial preprocessing before predictive analysis can be performed. This study presents a curated machine-learning-ready dataset derived from the 2023 NSCH survey together with a fully reproducible computational pipeline for studying child flourishing. The pipeline constructs a binary flourishing indicator based on four survey items capturing curiosity, persistence, emotional regulation, and engagement in learning. After removing skip codes and missing responses, 1,978 valid observations were retained from the original dataset of more than 55,000 records. Feature selection using mutual information was applied to produce a reduced set of interpretable predictors suitable for benchmarking and educational use. Baseline experiments using logistic regression and random forest models show moderate predictive performance, suggesting that child flourishing cannot be accurately predicted using demographic and household variables alone. A methodological comparison between weighted and unweighted models further shows that incorporating survey weights consistently reduces predictive performance. By releasing both the curated dataset and the reproducible pipeline, this study provides a reusable resource for machine learning research on child well-being.

Version published to 10.20944/preprints202603.1867.v1
Mar 24, 2026

A Next-Generation NLP Framework for Psychological Behavior Analysis Based on State-of-the-art Language Model

This article has 6 authors:
1. Mohit Kumar
2. Ashwani Kumar
3. Avinash Kumar Sharma
4. Nishant Gupta
5. Achyut Shankar
6. Gautam Kumar
This article has no evaluationsLatest version Apr 6, 2026
Machine learning-based predictive modeling of decline in intrinsic capacity among migrant older adults with children

This article has 8 authors:
1. Qinghua Zhang
2. Yu Wang
3. Xiaoxiao Hu
4. Xinuo Yao
5. Yuhan Yang
6. Danyan Lu
7. Shengguang Chen
8. Xiaoyu Chen
This article has no evaluationsLatest version Feb 19, 2026
Application of Machine Learning to Predict Teenage Pregnancy in Zambia: Evidence from 2024 Zambia Demographic and Health Surveys

This article has 3 authors:
1. Teebeny Zulu
2. Nasson Nathan Tembo
3. Patrick Musonda
This article has no evaluationsLatest version Feb 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Next-Generation NLP Framework for Psychological Behavior Analysis Based on State-of-the-art Language Model

Machine learning-based predictive modeling of decline in intrinsic capacity among migrant older adults with children

Application of Machine Learning to Predict Teenage Pregnancy in Zambia: Evidence from 2024 Zambia Demographic and Health Surveys