Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Coronary CT angiography (CCTA) is a first-line diagnostic modality for coronary artery disease (CAD), yet its interpretation requires significant expert experience. Although general-purpose multimodal artificial intelligence (GP-AI) models have shown promise in text-based medical tasks, their visual diagnostic performance in evaluating complex CCTA data remains poorly defined. Methods: This single-center retrospective study included 63 patients (252 vessel-based image sets) who underwent both CCTA and invasive coronary angiography. Expert physician consensus and four frontier GP-AI models (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) evaluated identical standardized static images using a zero-shot approach with default generation parameters. Obstructive disease was defined as ≥ 50% luminal stenosis. Diagnostic performance was validated against expert consensus for plaque characterization and quantitative coronary angiography (QCA) for stenosis severity. Results: Expert consensus demonstrated robust agreement with QCA across all coronary territories (κ = 0.774–0.933, p < 0.001). In contrast, a marked performance disparity was observed for the GP-AI models; none achieved statistically significant agreement with QCA in the prognostically critical left anterior descending (LAD) or left main coronary arteries (LMCA) (p > 0.05). While Gemini 2.5 showed a moderate correlation in the right coronary artery (ICC = 0.515), overall continuous stenosis assessment and plaque characterization remained uniformly limited and clinically unreliable across all models. Conclusion: Expert physician interpretation remains the reference standard for CCTA. Current frontier GP-AI models are not suitable for independent clinical interpretation of coronary imaging, particularly in anatomically complex segments. These findings emphasize that general visual reasoning cannot yet replace domain-specific cardiovascular AI solutions or expert clinical judgment in specialized radiological tasks.