A Comparative Analysis of Machine Learning Classification Techniques in the Prediction of Autism in Children

Akintayo Ayoade
Ebierimunu Abule
Chisom Onwugbenu
Idowu Olugbenga Adewumi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Autism Spectrum Disorder (ASD) impacts around 1 in 100 children worldwide, but prompt diagnosis is still limited due to subjective clinical assessments. This research compared nine supervised machine learning (ML) classifiers: Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), and ensemble methods such as Bagging, Boosting (XGBoost), and Stacking to predict ASD in children. Two artificially created datasets were used: one imbalanced dataset with 50,000 samples having 10% positive cases (5,000 autistic, 45,000 non-autistic), and one balanced dataset of the same size with 50% positive cases (25,000 autistic, 25,000 non-autistic). Every dataset included 19 features covering demographics (3 attributes), parental/medical history (3 attributes), behavioral screening items (10 binary responses), one combined score, and a binary target label. Metrics for evaluation comprised accuracy, precision, recall, F1-score, AUROC, and AUPRC. In the imbalanced dataset, RF reached an F1-score of 0.75, AUROC of 0.91, and AUPRC of 0.66, surpassing LR (F1 = 0.51) and KNN (F1 = 0.53). SVM and MLP closely trailed with F1-scores ranging from 0.71 to 0.73. In the balanced dataset, ensemble models notably enhanced performance: Stacking attained an F1-score of 0.91, AUROC of 0.96, and AUPRC of 0.95, whereas Boosting yielded F1 = 0.90 and AUROC = 0.95. Baseline models like LR and DT showed moderate improvements, achieving F1-scores of approximately 0.80–0.81. Statistical validation through McNemar’s test revealed significant differences (p = 0.040) between RF and SVM in imbalanced circumstances. Analysis of computational efficiency showed differences in runtime, with LR finishing in 5.5 seconds, RF in 17.0 seconds, and MLP in 37.3 seconds. The findings indicated that ensemble models, especially Stacking and Boosting, deliver enhanced predictive accuracy and reliability across various class distributions, suggesting their possible incorporation into clinical decision support systems for scalable, data-informed early detection of ASD.

Version published to 10.21203/rs.3.rs-7779588/v1 on Research Square
Oct 8, 2025

Enhanced Machine Learning Models for Parkinson’s Disease Detection Using Spiral Drawing Data: A Comprehensive Study

This article has 1 author:
1. Dang Dinh Son Dang Dinh Son
This article has no evaluationsLatest version Oct 10, 2025
Machine Learning Techniques for Predicting Brain Stroke Risk: Addressing Data Imbalance

This article has 2 authors:
1. Heshan Chandeepa Pathmakumara
2. Kavishka Thathsarani Rajapaksha
This article has no evaluationsLatest version Oct 10, 2025
Precision Lesion Profiling in Multiple Sclerosis: A Novel Pipeline for EDSS Prediction

This article has 3 authors:
1. Roba Gamal
2. Hoda Barka
3. Mayada Hadhoud
This article has no evaluationsLatest version Sep 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enhanced Machine Learning Models for Parkinson’s Disease Detection Using Spiral Drawing Data: A Comprehensive Study

Machine Learning Techniques for Predicting Brain Stroke Risk: Addressing Data Imbalance

Precision Lesion Profiling in Multiple Sclerosis: A Novel Pipeline for EDSS Prediction