High-Accuracy Machine Learning Classification of Dyslexia vs. Control: A Spectrotemporal Approach Achieving 91.67% Accuracy
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Developmental dyslexia, a specific reading disability frequently accompanied by phonological processing deficits, has spurred extensive research into methods of accurate, early, and reliable diagnosis. Recent advances in machine learning (ML) techniques offer promising avenues to classify dyslexic vs. control populations based on multimodal data—ranging from cognitive test scores to audiogram metrics—thereby assisting practitioners and researchers in identifying impaired vs. normative reading profiles. In this study, we present a comprehensive supervised-learning pipeline applied to a dataset (n = 56, with 20 dyslexic and 36 control participants) containing 55 features, including reading test scores, Raven’s Progressive Matrices, auditory thresholds, and other cognitive measures. After thorough data preprocessing and the use of advanced ML algorithms (Random Forest, Gradient Boosting, XGBoost, and CatBoost), we obtained a top classification accuracy of 91.67% with Random Forest on a held-out test set.Our approach integrates an iterative hyperparameter-tuning scheme employing stratified cross-validation and RandomizedSearchCV, ensuring model generalizability. Notably, the best-performing Random Forest model (n_estimators=200, max_depth=5, min_samples_split=5, min_samples_leaf=1) proved significantly robust for discriminating dyslexic vs. control participants, echoing prior evidence that ensemble-based techniques can effectively capture complex, multidimensional feature interactions. We also compare our classification pipeline to prior methodologies, including the Auditory Classification Image (ACI) framework, as used by Varnet et al. (2016), highlighting how data-driven strategies can complement or extend the insights derived from psychoacoustic analyses.Moreover, we contextualize our findings by discussing how advanced ML-based classification addresses the often heterogeneous nature of dyslexic reading profiles, potentially overcoming limitations of group-mean analyses alone. Our results align with the growing body of research underscoring the importance of multimodal assessment for dyslexia and point to the feasibility of near-automatic, robust classification pipelines that integrate phonological, cognitive, and audiometric features. We conclude that sophisticated ML systems, driven by thorough data preprocessing and feature engineering, hold significant promise for improving classification accuracy, paving the way for improved diagnostics in both clinical and research settings.