Augmentative Semi-Supervised Learning for Autism Screening: A Novel Framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Autism Spectrum Disorder (ASD) is a neurodevelopmental condition for which early identification is essential to provide appropriate support and effective treatment. However, current diagnostic methods are resource-intensive and often inaccessible. Artificial Intelligence offers a promising alternative, but its effectiveness is hindered by algorithmic bias arising from data scarcity and imbalanced, largely unlabeled datasets. Such bias can lead to model overfitting, impaired learning, and poor generalization. While semi-supervised learning (SSL) can reduce reliance on manual labels through pseudo-label generation, conventional SSL approaches perform poorly under severe class imbalance, often amplifying label noise and bias. To address these challenges, we propose a novel Augmentative Semi-supervised Learning (ASSL) framework designed for robust learning in the presence of class imbalance and label scarcity. ASSL first applies pattern-based sampling to construct a balanced labeled dataset. It then employs a Collaborative Decision Labeling (CDL) strategy, where two heterogeneous models assign pseudo-labels using Dynamic Dual Thresholding (DDT), retaining only samples jointly and confidently labeled by both models. The framework was applied to the Autism AI dataset (over 12,000 participants), most of whom lacked diagnostic labels, producing severe class imbalance. ASSL improved sensitivity by 15.3%, specificity by 30.2%, and accuracy by 15.9% over conventional screening methods. Next, in external validation on the NHANES diabetes dataset, ASSL achieved a 7.9% gain in sensitivity and better discriminatory performance under imbalance. These results demonstrate that ASSL is a scalable and generalizable approach for limited and imbalanced health data tasks, offering a pathway to reduce algorithmic bias across screening applications.

Article activity feed