Evaluation of the Effectiveness of the ChatGPT Artificial Intelligence Application in the Diagnosis of Pneumothorax on Chest Radiograph Interpretation

Onur Akcay
Azat Özel
Özgür Öztürk
Tuba ACAR
Ahmet Kayahan Tekneci
Tevfik İlker AKÇAM
Soner GÜRSOY

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Spontaneous pneumothorax is a potentially life-threatening condition commonly diagnosed using chest radiographs. However, interpreting chest X-rays can be challenging due to anatomical overlap and observer variability. This study aimed to evaluate the diagnostic accuracy of ChatGPT, a large language model (LLM), in detecting pneumothorax on chest radiographs compared to expert thoracic surgeons. Methods In this retrospective study, 220 chest radiographs were assessed. Expert consensus classified 110 cases with pneumothorax and 110 without. The images were uploaded to the GPT-4o model without any clinical information, and ChatGPT was asked to identify the presence or absence of pneumothorax. Diagnostic performance was evaluated by calculating sensitivity, specificity, accuracy, positive and negative predictive values, and area under the receiver operating characteristic curve (AUC). Subgroup analyses were performed based on pneumothorax size. Results ChatGPT demonstrated an overall diagnostic accuracy of 83.7%, sensitivity of 70.9%, specificity of 96.4%, positive predictive value of 95.1%, and negative predictive value of 76.8%. The AUC was 0.836 (95% CI: 0.780-0.893). Diagnostic performance was higher for large pneumothoraces (AUC: 0.894) compared to small pneumothoraces (AUC: 0.439). Cohen’s kappa coefficient indicated substantial agreement (κ=0.673; 95%CI: 0.575-0.771) with expert evaluations. Conclusions ChatGPT shows promise in detecting pneumothorax on chest radiographs, particularly for large pneumothoraces. However, its limited sensitivity for small pneumothoraces underscores the need for cautious clinical application. ChatGPT may serve as a supportive triage tool in settings with limited access to expert radiology services. Given the low sensitivity for small pneumothoraces, ChatGPT is not recommended for clinical decision-making; any potential role should be limited to exploratory, supervised triage settings. Trial registration Not applicable.

Version published to 10.21203/rs.3.rs-7297832/v1 on Research Square
Aug 27, 2025

AI-Assisted Lung Ultrasound for Pneumothorax: Diagnostic Accuracy Compared with CT in Emergency and Critical Care

This article has 2 authors:
1. İsmail Dal
2. Kemal Akyol
This article has no evaluationsLatest version Sep 10, 2025
Construction and Application of Early Warning Model for Ischemic Colitis in Emergency Patients Based on Machine Learning

This article has 9 authors:
1. Minzhe Lang
2. Haoyue Hu
3. Minxuan Xu
4. Peiyuan Shou
5. Wenbin Chen
6. Shaoce Zhi
7. Guangliang Hong
8. Wenwen Li
9. Xiaoqin Dai
This article has no evaluationsLatest version Aug 8, 2025
An Optimal Fusion Strategy for Automated Appendiceal Ultrasound Diagnosis and Reporting

This article has 3 authors:
1. Min Zhang
2. ian Li
3. Linyuan Jin
This article has no evaluationsLatest version Sep 4, 2025

Listed in

Abstract

Article activity feed

Related articles

AI-Assisted Lung Ultrasound for Pneumothorax: Diagnostic Accuracy Compared with CT in Emergency and Critical Care

Construction and Application of Early Warning Model for Ischemic Colitis in Emergency Patients Based on Machine Learning

An Optimal Fusion Strategy for Automated Appendiceal Ultrasound Diagnosis and Reporting