Evaluating the Diagnostic Performance of ChatGPT-4o Mini in the Classification of Chest X-Ray Pathologies

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Radiographic assessment of chest X-rays (CXR) is the current standard for diagnostic imaging, but advancements in artificial intelligence (AI) have opened new possibilities for automated pathology detection. This study evaluates the performance of OpenAI’s ChatGPT-4o mini in correctly identifying 14 distinct chest X-ray pathologies or confirming the absence of pathology using the VinDr-CXR dataset - specifically through the Kaggle data subset. After the GPT model was queried with a standardized prompt, key performance metrics like accuracy, precision, recall, and F1 scores were calculated, and a multi-classification confusion matrix were analyzed to assess performance.Results revealed significant limitations in ChatGPT-4o mini's diagnostic capabilities, with an overall accuracy of 0.05 and macro-averaged precision, recall, and F1 scores of 0.28, 0.17, and 0.14, respectively. Moreover, several pathologies, including aortic enlargement, interstitial lung disease, and pneumothorax, were entirely misclassified. Performance variability across classes appeared associated with dataset imbalances, as classes with higher support values generally showed more favorable outcomes.These findings highlight the significant challenges faced by the current ChatGPT-4o mini model in multi-class diagnostic classification, underscoring a need for improved model training before successful integration into practical clinical scenarios can be undertaken.

Article activity feed