AI-Assisted Decision-Making in Stroke Neuroimaging: A Prospective Study of Accuracy, Confidence, and Decision Change Among Emergency Medicine Residents

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Artificial intelligence (AI)-based decision-support systems are increasingly integrated into emergency neuroimaging workflows. Evidence on how AI feedback influences clinical decision quality — whether it improves or worsens accuracy — remains limited. This prospective simulation study evaluated the effect of an AI decision-support system on emergency medicine residents' diagnostic accuracy, confidence, and decision-change patterns in stroke neuroimaging interpretation. Methods Forty-eight emergency medicine residents (experience: 1–46 months; median 25.5 months) each independently assessed 35 neuroimaging cases selected from a large clinically confirmed institutional repository (16 stroke-positive, 19 normal; prevalence 45.7%). An independent expert radiologist without access to AI outputs selected the cases. Residents recorded an initial binary diagnosis and a confidence rating (1–5 Likert), then reviewed AI output and made a final decision with an updated confidence rating. Each of 1680 evaluations was assigned to one of eight mutually exclusive decision scenarios. To address clustering (48 residents × 35 cases), analyses were conducted at both evaluation level (n = 1680) and resident level (n = 48); mixed-effects linear regression was applied as a pre-specified sensitivity analysis. Results Residents' initial accuracy was 72.7% (95% CI 70.6–74.8%; MCC 0.457). The AI system yielded 62.9% accuracy (sensitivity 81.3%, specificity 47.4%; contextual comparator). Following AI feedback, 86/179 changes (48.0%) improved accuracy versus 54 (30.2%) that worsened it — a significant net benefit of + 32 (Binomial test, p = 0.004). Residents resisted erroneous AI in 315 evaluations (21.0%; S6-Robust Resistance) and missed correct AI guidance in 117 (7.8%; S7-Missed Opportunity). AI feedback produced a modest but significant confidence increase (Δ=+0.165; p < 0.001). Mixed-effects regression confirmed no significant experience group effect on accuracy (β = 0.008, p = 0.726; ICC = 0.007). Multivariable analysis identified lower baseline confidence (β=−0.066, p < 0.001) and AI accuracy (β=+0.052, p = 0.001) as independent predictors of decision change. Conclusions AI decision-support feedback produced a statistically significant net improvement in diagnostic decision quality (net benefit + 32; p = 0.004), confirmed across evaluation-level, resident-level, and mixed-effects analyses. Effects were independent of clinical experience level. Lower baseline confidence was identified as an independent predictor of AI-driven decision change, with implications for AI integration design and training. The coexistence of robust resistance (S6) and missed-opportunity (S7) patterns underscores that both over-reliance and under-reliance represent clinically relevant risks. These findings require replication in larger prospective studies before informing routine clinical practice.

Article activity feed