Evaluating the Generalizability of EEG-Based AI Models in Alzheimer’s and Dementia Diagnosis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We thoroughly investigated the generalizability of deep learning models trained on electroencephalography (EEG) data to detect Alzheimer’s disease and dementia at the individual subject level. Although average model performance appears strong, it may obscure large inter-individual variability, raising concerns for clinical deployment.

METHODS

We trained a Hopfield-enhanced deep neural network on a publicly available EEG dataset consisting of 88 participants, including individuals diagnosed with Alzheimer’s disease (AD), frontotemporal dementia (FTD), and cognitively normal controls (CN). Resting-state EEG recordings were segmented and used to train the model in a leave-onesubject-out (LOSO) cross-validation setup across multiple detection tasks: AD vs. CN, AD vs. FTD, FTD vs. CN, and AD vs. FTD vs. CN.

RESULT

While the model demonstrated high average performance (e.g., up to 83% accuracy), subject-level results revealed inconsistencies. Some individuals achieved perfect prediction even at the first training epoch, suggesting spurious memorization, while others predicted falsely throughout, with performance below chance. These patterns persisted despite consistent training conditions and no data leakage.

DISCUSSION

Our findings highlight that strong group-level performance may be misleading in clinical settings, where decisions are made at the individual level. The models should be generalizable across individuals and be evaluated per individual before being considered for diagnostic use. Hopfield networks show promise in capturing patterns in EEG data, but patient-level validation and transparent reporting are essential to avoid premature clinical translation.

Highlights

  • Deep learning models trained on EEG can achieve high average performance in detecting Alzheimer’s and dementia disease.

  • The subject-level evaluation revealed significant variability, including below-chance performance for some indi- viduals despite overall strong results.

  • Group-level metrics alone may be misleading; General- izable models and individual-level validation are critical before the clinical adoption of EEG-based AI models.

Article activity feed