Screening cognitive assessments (MMSE, CDT, MoCA) of eight large language models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In recent few years we have been observing a rapid growth of large language models (LLMs), a subsetof artificial intelligence technique, focused on processing and generation of textual information. Wewere unable to find any study that assessed cognitive performance of LLMs with the application oftools used for screening cognitive impairment in humans. Hence, the aim of the project was to studyeight LLMs (GPT-5, Claude Sonnet 4.5, DeepSeek V3.1, Gemini 2.5 Flash, Granite, Grok 4, Mistral3 and Llama 4) using three common cognitive tests: Mini Mental Status Examination (MMSE), ClockDrawing Test (CDT) and Montreal Cognitive Assessment (MoCA). Each test was performed threetimes for every LLM. We applied cognitive screening tools designed for screening humans and founddifferences regarding cognitive performance of LLMs. Of all models tested in the study, ClaudeSonnet 4.5 appeared to be best. However, even that model failed to achieve maximum available scoresin MMSE and MoCA. We observed a distinctive modality gap (visual vs language tasks), with highperformance in tasks requiring language processing. In certain tasks requiring visual informationprocessing performance of many models was poor or very poor, while in others their performancewas very good. Two tasks (CDT and Trial Making Test of MoCA) were the most difficult for LLMs.We found in the case of Claude Sonnet 4.5 that using LLMs as testing subjects can be a challengingtask.