Artificial Intelligence in Neuro-Oncology: Assessing ChatGPT’s Accuracy in MRI Interpretation and Treatment Advice

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose

Large language models (LLMs) have demonstrated advanced capabilities in interpreting text and visual inputs. Their potential to transform oncological practice is significant, but their accuracy and reliability in interpreting medical imaging and offering management suggestions remain underexplored. This study aimed to evaluate the performance of ChatGPT in interpreting T1-weighted contrast-enhanced MRI images of meningiomas and glioblastomas and providing treatment recommendations based on simulated patient inquiries.

Methods

This observational cohort study utilized publicly available MRI datasets. Thirty cases of meningiomas and glioblastomas were randomly selected, yielding 90 images (three orthogonal planes per case). ChatGPT-4o was tasked with interpreting these images and responding to six standardized patient-simulated questions. Two neuroradiologists and neurosurgeons assessed ChatGPT’s performance using five-point Likert scales and their inter-rater agreement was evaluated.

Results

ChatGPT identified MRI sequences with 91.7% accuracy and localized tumors correctly in 66.7% of cases. Tumor size was qualitatively described in 85% of cases, and the median acceptability was rated as 4.0 (IQR 4.0–5.0) by neuroradiologists. ChatGPT included meningioma in the differential diagnosis for 73.3% of meningioma cases and glioma in 83.3% of glioblastoma cases. Inter-rater agreement among neuroradiologists ranged from moderate to good (κ = 0.45–0.72). While surgical treatment was suggested in all symptomatic cases, neurosurgeon acceptability ratings varied, with poor inter-rater reliability.

Conclusions

ChatGPT demonstrates potential in interpreting neuro-oncological MRI images and offering preliminary management recommendations. However, errors in tumor localization and variability in recommendation acceptability underscore the need for physician oversight and further refinement of LLMs before clinical integration.

Article activity feed