AI-Assisted Decision-Making in Stroke Neuroimaging: A Prospective Study of Accuracy, Confidence, and Decision Change Among Emergency Medicine Residents
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Artificial intelligence (AI)-based decision-support systems are increasingly integrated into emergency neuroimaging workflows. Evidence on how AI feedback influences clinical decision quality — whether it improves or worsens accuracy — remains limited. This prospective simulation study evaluated the effect of an AI decision-support system on emergency medicine residents' diagnostic accuracy, confidence, and decision-change patterns in stroke neuroimaging interpretation. Methods Forty-eight emergency medicine residents (experience: 1–46 months; median 25.5 months) each independently assessed 35 neuroimaging cases selected from a large clinically confirmed institutional repository (16 stroke-positive, 19 normal; prevalence 45.7%). An independent expert radiologist without access to AI outputs selected the cases. Residents recorded an initial binary diagnosis and a confidence rating (1–5 Likert), then reviewed AI output and made a final decision with an updated confidence rating. Each of 1680 evaluations was assigned to one of eight mutually exclusive decision scenarios. To address clustering (48 residents × 35 cases), analyses were conducted at both evaluation level (n = 1680) and resident level (n = 48); mixed-effects linear regression was applied as a pre-specified sensitivity analysis. Results Residents' initial accuracy was 72.7% (95% CI 70.6–74.8%; MCC 0.457). The AI system yielded 62.9% accuracy (sensitivity 81.3%, specificity 47.4%; contextual comparator). Following AI feedback, 86/179 changes (48.0%) improved accuracy versus 54 (30.2%) that worsened it — a significant net benefit of + 32 (Binomial test, p = 0.004). Residents resisted erroneous AI in 315 evaluations (21.0%; S6-Robust Resistance) and missed correct AI guidance in 117 (7.8%; S7-Missed Opportunity). AI feedback produced a modest but significant confidence increase (Δ=+0.165; p < 0.001). Mixed-effects regression confirmed no significant experience group effect on accuracy (β = 0.008, p = 0.726; ICC = 0.007). Multivariable analysis identified lower baseline confidence (β=−0.066, p < 0.001) and AI accuracy (β=+0.052, p = 0.001) as independent predictors of decision change. Conclusions AI decision-support feedback produced a statistically significant net improvement in diagnostic decision quality (net benefit + 32; p = 0.004), confirmed across evaluation-level, resident-level, and mixed-effects analyses. Effects were independent of clinical experience level. Lower baseline confidence was identified as an independent predictor of AI-driven decision change, with implications for AI integration design and training. The coexistence of robust resistance (S6) and missed-opportunity (S7) patterns underscores that both over-reliance and under-reliance represent clinically relevant risks. These findings require replication in larger prospective studies before informing routine clinical practice.