Screening cognitive assessments (MMSE, CDT, MoCA) of eight large language models

Adam Wysokiński
Zofia Galczak

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In recent few years we have been observing a rapid growth of large language models (LLMs), a subsetof artificial intelligence technique, focused on processing and generation of textual information. Wewere unable to find any study that assessed cognitive performance of LLMs with the application oftools used for screening cognitive impairment in humans. Hence, the aim of the project was to studyeight LLMs (GPT-5, Claude Sonnet 4.5, DeepSeek V3.1, Gemini 2.5 Flash, Granite, Grok 4, Mistral3 and Llama 4) using three common cognitive tests: Mini Mental Status Examination (MMSE), ClockDrawing Test (CDT) and Montreal Cognitive Assessment (MoCA). Each test was performed threetimes for every LLM. We applied cognitive screening tools designed for screening humans and founddifferences regarding cognitive performance of LLMs. Of all models tested in the study, ClaudeSonnet 4.5 appeared to be best. However, even that model failed to achieve maximum available scoresin MMSE and MoCA. We observed a distinctive modality gap (visual vs language tasks), with highperformance in tasks requiring language processing. In certain tasks requiring visual informationprocessing performance of many models was poor or very poor, while in others their performancewas very good. Two tasks (CDT and Trial Making Test of MoCA) were the most difficult for LLMs.We found in the case of Claude Sonnet 4.5 that using LLMs as testing subjects can be a challengingtask.

Version published to 10.31234/osf.io/hxyqe_v1 on OSF Preprints
Apr 5, 2026

State-of-the-Art Testamentary Capacity Assessment Tool (TCAT) in Dementia: A Review of Studies and Update Report

This article has 4 authors:
1. Panagiota Voskou
2. Athanasios Douzenis
3. Alexandra Economou
4. Sokratis G. Papageorgiou
This article has no evaluationsLatest version May 14, 2026
Evaluating 11 Large Language Models in Answering Key Questions on Ovarian Cancer

This article has 7 authors:
1. Michela Quaranta
2. Yong Sheng Tan
3. Areti Karamanou
4. Evangelos Kalampokis
5. Nicolas M Orsi
6. Diederick DeJong
7. Alexandros Laios
This article has no evaluationsLatest version Apr 11, 2026
Can Large Language Models Emulate Human Performance on Educational Assessments?

This article has 4 authors:
1. Xiuxiu Tang
2. Yikai Lu
3. John T. Behrens
4. Ying Cheng
This article has no evaluationsLatest version Apr 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

State-of-the-Art Testamentary Capacity Assessment Tool (TCAT) in Dementia: A Review of Studies and Update Report

Evaluating 11 Large Language Models in Answering Key Questions on Ovarian Cancer

Can Large Language Models Emulate Human Performance on Educational Assessments?