Evaluating the Efficacy of Large Language Models in Addressing Patient-Centric Inquiries in Multiple Cancers

Soheila Borhani
Xiaoqian Jiang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models (LLMs) have transformed how patients access health information online. Chatbots like ChatGPT allow users to ask direct questions and receive tailored answers almost instantly. However, for LLMs to be effective, the answers they provide must be reliable and accessible to patients. Our review assessed the reliability and accessibility of LLMs in answering patient inquiries about breast, prostate, and lung cancer.

Methods

A systematic search of the PubMed, Embase, and Web of Science databases was conducted. Included studies were peer-reviewed original research, published in English, that evaluated one or more LLMs in answering patients’ oncology questions. To enable result aggregation, a linear transformation was applied to standardize data from studies that used different Likert scales.

Results

We identified three common measures of reliability (accuracy, quality, consistency), and three measures of accessibility (readability, understandability, actionability) across the thirty-six studies that met our inclusion criteria. Accuracy and quality scores showed roughly similar distributions, with median values of 79.0% and 76.5%, respectively. Consistency levels were high in the few studies that provided this data (median = 100%). Despite all LLMs having readability scores significantly below the recommended level for patient-facing materials (median = 40.4%), several studies reported substantial improvements through prompt engineering. Understandability (median = 69.0%) and particularly, actionability (median = 40.0%) scores were lower than desired.

Conclusions

Despite current limitations, LLMs hold significant potential as an assistant tool for disseminating health information to patients. Active involvement of physicians in model training and validation can help improve their performance.

Version published to 10.1101/2025.08.05.25332968 on medRxiv
Aug 7, 2025

Beyond Accuracy: Multidimensional Evaluation of Large Language Models in Hepatocellular Carcinoma Management Emphasizing Prompting

This article has 8 authors:
1. Jianchen Luo
2. Jing Ma
3. Tao Wang
4. Yiwen Qiu
5. Yi Yang
6. Haizhou Qiu
7. Guanhua Chen
8. Wentao Wang
This article has no evaluationsLatest version Jul 15, 2025
Physician Evaluations of Large Language Model-Generated Responses to Medical Questions by Region and Years in Practice: A preliminary study

This article has 8 authors:
1. James Brooks
2. Paa-Kwesi Blankson
3. Peter Murphy Campbell
4. R Adams Cowley
5. Tsorng-Shyang Yang
6. Tijani Oseni
7. Anny Rodriguez
8. Muhammed Y. Idris
This article has no evaluationsLatest version Aug 19, 2025
Beyond Accuracy in Small Open-Source Medical Large Language Models for Pediatric Endocrinology

This article has 5 authors:
1. Vanessa D’Amario
2. Randy Daniel
3. Dhruv Edamadaka
4. Nitya Alaparthy
5. Joshua Tarkoff
This article has no evaluationsLatest version Aug 27, 2025

Listed in

Abstract

Methods

Results

Conclusions

Article activity feed

Related articles

Beyond Accuracy: Multidimensional Evaluation of Large Language Models in Hepatocellular Carcinoma Management Emphasizing Prompting

Physician Evaluations of Large Language Model-Generated Responses to Medical Questions by Region and Years in Practice: A preliminary study

Beyond Accuracy in Small Open-Source Medical Large Language Models for Pediatric Endocrinology