Evaluating ChatGPT-4’s Role in Diagnosing and Grading Diabetic Retinopathy from Fundus Images

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective: To evaluate ChatGPT-4’s ability to diagnose and grade diabetic retinopathy (DR) and macular edema (DME) from fundus images, with potential application in resource-limited settings. Methods: 516 images from the Indian Diabetic Retinopathy Dataset (IDRiD) were utilized, with prior expert grading for DR and DME. ChatGPT-4 generated an optimized prompt for DR and DME analysis. Each image was input into a new chat with the memory turned off to prevent bias. Model performance was evaluated using accuracy, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), mean absolute error (MAE), and quadratic weighted kappa (QWK). Results: For DR detection, ChatGPT-4 achieved an accuracy of 79.7%, sensitivity of 81.6%, specificity of 75.6%, and AUC of 0.79. DR grading showed moderate agreement (QWK: 0.52). For DME detection, accuracy was 82%, with a sensitivity of 84.1%, specificity of 79.2%, and AUC of 0.82. DME grading had higher agreement (QWK:0.71). The MAE for DR and DME grading were 0.82 and 0.33, respectively. Conclusion: ChatGPT-4 shows potential for DR and DME detection and grading from fundus images but underperforms compared to specialized AI models. With the growing burden of DR and the critical importance of early detection, ChatGPT-4 may support screening efforts in resource-limited settings. Its accessibility, low cost, and continuous evolution make it a potentially valuable clinical tool in reducing vision loss among underserved populations. While ChatGPT-4 cannot replace clinical expertise, it may assist in the detection, monitoring, and referral of patients, particularly in areas with limited access to ophthalmologic care.

Article activity feed