A Machine-Learning-Based Investigation of ADHD Diagnosis Using the HYPERAKTIV Dataset

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Attention-Deficit/Hyperactivity Disorder (ADHD) is a common neuropsychiatric condition affecting up to 5% of adults worldwide, imposing significant burdens on daily functioning, social interactions, and overall quality of life. Traditional diagnostic practices largely rely on subjective evaluations and clinical observations. Recent computational approaches have explored objective diagnostic support using wearable sensors and machine-learning algorithms, but reproducibility and standard benchmarks remain limited. In this paper, we present a full end-to-end study using the publicly available HYPERAKTIV dataset, which comprises motor activity data, heart-rate data, computerized test scores (CPT-II), and comprehensive patient information (demographics, diagnostic assessments, medication use). We examine data from 103 participants—51 diagnosed with ADHD and 52 clinical controls—and propose a supervised-learning pipeline that includes data preprocessing, feature extraction, hyperparameter tuning, model evaluation, and result visualization. We achieve promising results using Logistic Regression (73.08% accuracy), Random Forest (76.92%), and XGBoost (80.77%). Our findings confirm the feasibility of combining objective sensor data and classic neuropsychological testing to differentiate adult ADHD from other clinical conditions. This paper provides a step-by-step methodology, complete with code references, experiment details, metrics, and interpretive insights. We compare our work against another study on the same dataset (Hicks et al., HYPERAKTIV: An Activity Dataset from Adult Patients with ADHD) and discuss new aspects, such as additional feature engineering and improved classification performance. Finally, we contribute an extensive discussion on the limitations and potential future directions, encompassing multi-modal data fusion, interpretability, and real-world clinical applications.

Article activity feed