Artificial Intelligence (AI) Assisted Decision Making in Malignant Pleural Mesothelioma: A Comparative Study of AI Responses

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective : This study evaluates the applicability of artificial intelligence (AI) in clinical decision-making for malignant pleural mesothelioma (MPM) by comparing treatment recommendations from large language models (LLMs) with expert decisions. Methods : A retrospective analysis was conducted on 12 MPM cases treated between 2021 and 2023 at a tertiary university hospital. AI-generated recommendations from ChatGPT, Gemini, and Copilot were compared with multidisciplinary tumor board decisions regarding initial treatment strategy, radiotherapy (RT) timing, target volume delineation, dosimetric assessment, and RT plan approval. AI responses were scored using a 5-point Likert scale by three radiation oncologists. Readability and quality assessments were performed using the DISCERN scale and established readability metrics. Statistical analyses included intraclass correlation coefficients, Friedman tests, and Wilcoxon signed-rank tests with Bonferroni correction. Results : The study analyzed 12 cases with 60 questions comparing three LLMs in mesothelioma treatment decision-making. ChatGPT demonstrated superior performance with the highest mean score (4.50 ± 0.57) and a median score of 5, significantly outperforming Gemini (mean 3.77 ± 0.43, median 4) and Copilot (mean 3.85 ± 0.52, median 4) (p < 0.001). Category-specific analysis showed that ChatGPT consistently excelled across all decision-making domains, particularly in RT timing and dosimetric data evaluation (median scores of 5). It significantly outperformed the other models in four of five categories: Initial Treatment Recommendation, Radiotherapy Timing, Radiotherapy Planning, and Dosimetric Data Evaluation (all p < 0.05). Gemini maintained moderate performance with median scores of 4 across all categories. Copilot showed variable performance with median scores ranging from 3 to 4. In RT Plan Approval, ChatGPT and Gemini performed similarly (p = 1.000), while Copilot scored significantly lower (p = 0.025). ChatGPT achieved the highest DISCERN score (70/75, excellent quality), while Copilot (62/75) and Gemini (61/75) were rated as good. Readability analyses classified all AI outputs as "difficult to read," with Copilot being the most readable (Flesch Reading Ease Score = 35.52). Conclusion : Among the evaluated AI models, ChatGPT provided the most accurate and clinically relevant recommendations for MPM management. While AI tools show promise in decision support, further validation is required before integration into clinical workflows. Future research should focus on enhancing readability and reliability for clinical applications.

Article activity feed