Evaluating the Generalizability of EEG-Based AI Models in Alzheimer’s and Dementia Diagnosis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We thoroughly investigated the generalizability of deep learning models trained on electroencephalography (EEG) data to detect Alzheimer’s disease and dementia at the individual subject level. Although average model performance appears strong, it may obscure large inter-individual variability, raising concerns for clinical deployment.
METHODS
We trained a Hopfield-enhanced deep neural network on a publicly available EEG dataset consisting of 88 participants, including individuals diagnosed with Alzheimer’s disease (AD), frontotemporal dementia (FTD), and cognitively normal controls (CN). Resting-state EEG recordings were segmented and used to train the model in a leave-onesubject-out (LOSO) cross-validation setup across multiple detection tasks: AD vs. CN, AD vs. FTD, FTD vs. CN, and AD vs. FTD vs. CN.
RESULT
While the model demonstrated high average performance (e.g., up to 83% accuracy), subject-level results revealed inconsistencies. Some individuals achieved perfect prediction even at the first training epoch, suggesting spurious memorization, while others predicted falsely throughout, with performance below chance. These patterns persisted despite consistent training conditions and no data leakage.
DISCUSSION
Our findings highlight that strong group-level performance may be misleading in clinical settings, where decisions are made at the individual level. The models should be generalizable across individuals and be evaluated per individual before being considered for diagnostic use. Hopfield networks show promise in capturing patterns in EEG data, but patient-level validation and transparent reporting are essential to avoid premature clinical translation.
Highlights
-
Deep learning models trained on EEG can achieve high average performance in detecting Alzheimer’s and dementia disease.
-
The subject-level evaluation revealed significant variability, including below-chance performance for some indi- viduals despite overall strong results.
-
Group-level metrics alone may be misleading; General- izable models and individual-level validation are critical before the clinical adoption of EEG-based AI models.