Evaluating the Efficacy of Large Language Models in Addressing Patient-Centric Inquiries in Multiple Cancers
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) have transformed how patients access health information online. Chatbots like ChatGPT allow users to ask direct questions and receive tailored answers almost instantly. However, for LLMs to be effective, the answers they provide must be reliable and accessible to patients. Our review assessed the reliability and accessibility of LLMs in answering patient inquiries about breast, prostate, and lung cancer.
Methods
A systematic search of the PubMed, Embase, and Web of Science databases was conducted. Included studies were peer-reviewed original research, published in English, that evaluated one or more LLMs in answering patients’ oncology questions. To enable result aggregation, a linear transformation was applied to standardize data from studies that used different Likert scales.
Results
We identified three common measures of reliability (accuracy, quality, consistency), and three measures of accessibility (readability, understandability, actionability) across the thirty-six studies that met our inclusion criteria. Accuracy and quality scores showed roughly similar distributions, with median values of 79.0% and 76.5%, respectively. Consistency levels were high in the few studies that provided this data (median = 100%). Despite all LLMs having readability scores significantly below the recommended level for patient-facing materials (median = 40.4%), several studies reported substantial improvements through prompt engineering. Understandability (median = 69.0%) and particularly, actionability (median = 40.0%) scores were lower than desired.
Conclusions
Despite current limitations, LLMs hold significant potential as an assistant tool for disseminating health information to patients. Active involvement of physicians in model training and validation can help improve their performance.