Evaluation of the Effectiveness of the ChatGPT Artificial Intelligence Application in the Diagnosis of Pneumothorax on Chest Radiograph Interpretation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Spontaneous pneumothorax is a potentially life-threatening condition commonly diagnosed using chest radiographs. However, interpreting chest X-rays can be challenging due to anatomical overlap and observer variability. This study aimed to evaluate the diagnostic accuracy of ChatGPT, a large language model (LLM), in detecting pneumothorax on chest radiographs compared to expert thoracic surgeons. Methods In this retrospective study, 220 chest radiographs were assessed. Expert consensus classified 110 cases with pneumothorax and 110 without. The images were uploaded to the GPT-4o model without any clinical information, and ChatGPT was asked to identify the presence or absence of pneumothorax. Diagnostic performance was evaluated by calculating sensitivity, specificity, accuracy, positive and negative predictive values, and area under the receiver operating characteristic curve (AUC). Subgroup analyses were performed based on pneumothorax size. Results ChatGPT demonstrated an overall diagnostic accuracy of 83.7%, sensitivity of 70.9%, specificity of 96.4%, positive predictive value of 95.1%, and negative predictive value of 76.8%. The AUC was 0.836 (95% CI: 0.780-0.893). Diagnostic performance was higher for large pneumothoraces (AUC: 0.894) compared to small pneumothoraces (AUC: 0.439). Cohen’s kappa coefficient indicated substantial agreement (κ=0.673; 95%CI: 0.575-0.771) with expert evaluations. Conclusions ChatGPT shows promise in detecting pneumothorax on chest radiographs, particularly for large pneumothoraces. However, its limited sensitivity for small pneumothoraces underscores the need for cautious clinical application. ChatGPT may serve as a supportive triage tool in settings with limited access to expert radiology services. Given the low sensitivity for small pneumothoraces, ChatGPT is not recommended for clinical decision-making; any potential role should be limited to exploratory, supervised triage settings. Trial registration Not applicable.