Generalizability, Interpretability, and Clinical Readiness of Deep Learning Methods for Alzheimer’s Disease: A Systematic Literature Review

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Early and correct identification of Alzheimer's disease is essential for prompt intervention and medical care. Recent advancements use machine learning and deep learning techniques on neuroimaging, genetic, and clinical data to identify Alzheimer’s disease from cognitively normal patients and predict progression from moderate cognitive impairment. This systematic literature review examines studies published between 2019 and 2025 that utilized major datasets, including the Alzheimer’s Disease Neuroimaging Initiative, as well as inputs such as T1-weighted magnetic resonance imaging, electroencephalography, and multimodal neuroimaging data. The reviewed approaches encompass voxel-wise three-dimensional convolutional neural networks, hybrid convolutional neural network–transformer architectures, attention-based multimodal fusion frameworks, and conventional machine learning models such as Random Forest, Extreme Gradient Boosting, and Generalized Linear Models. Common preprocessing techniques include intensity correction, spatial normalization, skull stripping, and data augmentation through rotations, flips, and generative adversarial network–based oversampling. The primary evaluation metrics reported are accuracy, sensitivity, specificity, F1-score, and area under the receiver operating characteristic curve. Interpretability techniques such as Grad-CAM, Layer-Wise Relevance Propagation, and saliency maps were increasingly adopted to visualize discriminative brain regions. Models integrating hybrid architectures and multimodal information demonstrate enhanced robustness, external validation remains limited. Persistent challenges include class imbalance, subject-level data leakage, small dataset sizes, and poor cross-cohort generalizability. Future research should emphasize larger, multi-center datasets, standardized evaluation protocols, and interpretable models that are clinically meaningful and translatable.

Article activity feed