Diagnostic Performance of Expert Physicians Versus General-Purpose Artificial Intelligence Using Standardized Static Coronary CT Images: A Dual-Reference Validation Study

Sefa Okar
Ziya Gökalp BİLGEL
İsa Göktürk BALCI
Gürcan ERBAY
Mustafa YILMAZ

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Coronary CT angiography (CCTA) is a first-line diagnostic modality for coronary artery disease (CAD), yet its interpretation requires significant expert experience. Although general-purpose multimodal artificial intelligence (GP-AI) models have shown promise in text-based medical tasks, their visual diagnostic performance in evaluating complex CCTA data remains poorly defined. Methods: This single-center retrospective study included 63 patients (252 vessel-based image sets) who underwent both CCTA and invasive coronary angiography. Expert physician consensus and four frontier GP-AI models (GPT-4o, Gemini 2.5, Claude 3.5 Sonnet, and Grok 4) evaluated identical standardized static images using a zero-shot approach with default generation parameters. Obstructive disease was defined as ≥ 50% luminal stenosis. Diagnostic performance was validated against expert consensus for plaque characterization and quantitative coronary angiography (QCA) for stenosis severity. Results: Expert consensus demonstrated robust agreement with QCA across all coronary territories (κ = 0.774–0.933, p < 0.001). In contrast, a marked performance disparity was observed for the GP-AI models; none achieved statistically significant agreement with QCA in the prognostically critical left anterior descending (LAD) or left main coronary arteries (LMCA) (p > 0.05). While Gemini 2.5 showed a moderate correlation in the right coronary artery (ICC = 0.515), overall continuous stenosis assessment and plaque characterization remained uniformly limited and clinically unreliable across all models. Conclusion: Expert physician interpretation remains the reference standard for CCTA. Current frontier GP-AI models are not suitable for independent clinical interpretation of coronary imaging, particularly in anatomically complex segments. These findings emphasize that general visual reasoning cannot yet replace domain-specific cardiovascular AI solutions or expert clinical judgment in specialized radiological tasks.

Version published to 10.21203/rs.3.rs-9135888/v1 on Research Square
Apr 5, 2026

Sensitivity and Predictors of False-Negative SPECT Myocardial Perfusion Imaging in a High-Burden Coronary Artery Disease Population: A Retrospective Analysis Using Revascularization as the Reference Standard

This article has 6 authors:
1. Lauren Wright
2. Jay Hamze
3. Tarun R. Nagrani
4. Nikki Arnold
5. Raquel McGlone
6. Jeremiah Martin
This article has no evaluationsLatest version Apr 15, 2026
AI-Based Low-Dose Chest CT Assessment of Coronary Artery Calcification Score: An Analysis of Accuracy-Relating Factors

This article has 10 authors:
1. Xiao Juan Lin
2. Bo Rong Tang
3. Lin Sha Cai
4. Li Wei Xue
5. Li Li Wang
6. Wan Yi Zheng
7. Su Ping Chen
8. Xiong Xin Ye
9. Yun Jing Xue
10. Yuan Fen Liu
This article has no evaluationsLatest version Mar 28, 2026
Modified ResNet-50 and Custom CNN Ensembles Achieve Near Perfect STEMI Detection on European ST-T Database

This article has 4 authors:
1. Sonam Nagar
2. Karan Verma
3. Sachin Singh
4. Kumar Shashvat
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Sensitivity and Predictors of False-Negative SPECT Myocardial Perfusion Imaging in a High-Burden Coronary Artery Disease Population: A Retrospective Analysis Using Revascularization as the Reference Standard

AI-Based Low-Dose Chest CT Assessment of Coronary Artery Calcification Score: An Analysis of Accuracy-Relating Factors

Modified ResNet-50 and Custom CNN Ensembles Achieve Near Perfect STEMI Detection on European ST-T Database