A Supervised Machine Learning Approach to Classifying ADHD in a Diverse Child fNIRS Dataset: Evidence from Bilingual and Monolingual Language Environments

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Attention-Deficit/Hyperactivity Disorder (ADHD) has long been a challenge in developmental, educational, and cognitive neuroscience, demanding advances in both theoretical understanding and practical diagnostic approaches. In recent years, functional Near-Infrared Spectroscopy (fNIRS) has emerged as a powerful, non-invasive neuroimaging tool for examining children’s cortical hemodynamics during language and literacy tasks. Meanwhile, new machine learning (ML) methodologies have shown promise in extracting clinically relevant patterns from large, heterogeneous neuro-behavioral datasets. However, most applications of ML in clinical child populations have focused on well-known modalities such as Magnetic Resonance Imaging (MRI) or electroencephalography (EEG). Few studies leverage fNIRS data combined with extensive behavioral and demographic information, especially in children from diverse linguistic backgrounds.Here, we present a systematic supervised ML approach to classify ADHD in children aged 5–11. We rely on an openly available fNIRS dataset from University of Michigan’s Deep Blue Data repository, titled: Morphological and phonological processing in English monolingual, Chinese-English bilingual, and Spanish-English bilingual children: An fNIRS neuroimaging dataset. This dataset comprises 343 children, including English monolinguals, Chinese-English bilinguals, and Spanish-English bilinguals (1). It incorporates broad neuroimaging and behavioral measures of morphological and phonological awareness, reading proficiency, and demographic questionnaires. Our ML pipeline employs a range of algorithms—Logistic Regression, Support Vector Classification (SVC), Random Forest, Gradient Boosting, and XGBoost—tuned via cross-validation for optimal performance. We highlight a final best model, SVC, which achieved a 0.625 macro F1 score in 5-fold cross-validation and revealed near-perfect performance for the majority class on the final test set. We provide in-depth classification metrics, confusion matrices, and receiver operating characteristic (ROC) curves. We also integrate interpretive discussions of bilingualism, ADHD, morphological/phonological awareness, and neural data.This paper is structured in line with guidelines for high-impact research: we begin with an Introduction summarizing ADHD, bilingual language development, and the utility of fNIRS. We then describe our Methods, including participant selection, the dataset’s language/reading tasks, and the full ML pipeline from data preprocessing to hyperparameter tuning. Our Results section documents classification performance and the derived metrics. In the Discussion, we contextualize the findings and outline implications for early screening and interventions, bridging machine learning with children’s neurocognitive profiles. Finally, we draw our Conclusions on future directions for integrated neuroimaging and ML-based classification, especially for clinical subgroups in bilingual child populations.

Article activity feed