Automated Detection of Early-Stage Dementia Using Large Language Models: A Comparative Study on Narrative Speech

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The growing global burden of dementia underscores the urgent need for scalable, objective screening tools. While traditional diagnostic methods rely on subjective assessments, advances in natural language processing offer promising alternatives. In this study, we compare two classes of language models—encoder-based pretrained language models (PLMs) and autoregressive large language models (LLMs) for detecting cognitive impairment from narrative speech. Using the DementiaBank Pitt Corpus and the widely used Cookie Theft picture description task, we evaluate BERT as a representative PLM alongside GPT-2, GPT-3.5 Turbo, GPT-4, and LLaMA-2 as LLMs. Although all models are pretrained, we distinguish PLMs and LLMs based on their architectural differences and training paradigms. Our findings reveal that BERT outperforms all other models, achieving 86% sensitivity and 95% specificity. LLaMA-2 follows closely, while GPT-4 and GPT-3.5 underperform in this structured classification task. Interestingly, LLMs demonstrate complementary strengths in capturing narrative richness and subtler linguistic features. These results suggest that hybrid modeling approaches may offer enhanced performance and interpretability. Our study highlights the potential of language models as digital biomarkers and lays the groundwork for scalable, AIpowered tools to support early dementia screening in clinical practice.

Article activity feed