Evaluating Accuracy and Reasoning Capabilities of Large Language Models for Acute Ischemic Stroke Management

Aymen Meddeb
Navid Bakhtiari
Ida Rangus
Leonard Fetscher
Bastien Leguellec
Felix Busch
Alexandre Doucet
Vi-Tuan Hua
Fuong Verot-Nguyen
Laurentiu Valentin Paiusan
Pierre Francois Manceau
Paolo Pagano
Mike P. Wattjes
Solene Moulin
Laurent Pierot
Sebastien Soize

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Acute ischemic stroke (AIS) management has evolved substantially over the past two decades, with mechanical thrombectomy adding complexity that requires specialized centers. As many patients initially present to primary care facilities, rapihtmld and accurate triage is critical. Large language models (LLMs) may help bridge expertise gaps, especially where stroke specialists are not immediately available. This study evaluates the diagnostic accuracy and reasoning quality of four LLMs in determining eligibility for intravenous thrombolysis (IVT) and mechanical thrombectomy (MT), compared with experienced clinicians and real-world treatment decisions.

Methods

We retrospectively collected 80 acute ischemic stroke cases from two stroke centers. Cases were presented to LLMs as well to clinicians as clinical vignettes containing demographic, clinical, and imaging data. Four LLMs (DeepSeek R1, OpenAI o3 mini, Gemini 2.0, LLaMA 3.3) and six stroke experts (two neurologists, four neuroradiologists) independently reviewed the cases and recommended one or more treatment strategies including IVT and MT. The ground truth was defined as the institutional treatment decision. Accuracy for MT and IVT recommendations was calculated for both LLMs and clinicians. Additionally, a qualitative error analysis evaluated the reasoning ability of LLMs.

Results

Open-source reasoning model DeepSeek R1 outperformed all other LLMs and clinicians for MT (87% accuracy) and achieved 78% accuracy for IVT. Across models, accuracy was generally higher for MT than for IVT. Neurologists reached 81% (MT) and 80% (IVT), while neuroradiologists achieved 84% (MT) and 76% (IVT). Reasoning analysis for MT recommendations showed that most errors were clinically reasonable but differed from real-world decision, whereas IVT errors were primarily due to guideline non-adherence.

Conclusions

LLMs can match or even exceed expert clinician performance in MT and IVT eligibility decisions, while providing transparent reasoning. These findings support prospective evaluation of LLM-based decision support in acute stroke care, especially in settings without immediate specialist expertise.

Version published to 10.1101/2025.09.03.25335060 on medRxiv
Sep 5, 2025

A Transparent AI Model for Predicting Post-Thrombolytic Outcomes in Acute Ischemic Stroke Patients

This article has 9 authors:
1. Yudiao Huang
2. Jia Fu
3. Ting Gong
4. Jiaxin Yang
5. Yan Xing
6. Zhao Wang
7. Yongzhi San
8. Jing Qi
9. Xiaodong Zheng
This article has no evaluationsLatest version Sep 22, 2025
An Interpretable Machine Learning-Based Prognostic Prediction Model for Acute Ischemic Stroke Patients with Large Vessel Occlusion

This article has 6 authors:
1. Meng Li
2. Jia Wei
3. Guangshun Lu
4. Hongjuan Zhao
5. Chengping Bai
6. Haihua bao
This article has no evaluationsLatest version Sep 1, 2025
A Comparative Performance Analysis of AI-Assisted Language Models in Preoperative Patient Education for Mitral Valve Surgery

This article has 10 authors:
1. Banu Bahriye Akdağ
2. Mehmet Şenel Bademci
3. İhsan Peker
4. Okay Güven Karaca
5. Çağrı Kandemir
6. Barçın Özcem
7. Hüseyin Durmaz
8. Meryem Çakır
9. İrem Özçetin
10. Hidayet Onur Selçuk
This article has no evaluationsLatest version Sep 9, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

A Transparent AI Model for Predicting Post-Thrombolytic Outcomes in Acute Ischemic Stroke Patients

An Interpretable Machine Learning-Based Prognostic Prediction Model for Acute Ischemic Stroke Patients with Large Vessel Occlusion

A Comparative Performance Analysis of AI-Assisted Language Models in Preoperative Patient Education for Mitral Valve Surgery