Evaluating Evidence-Based Communication through Generative AI using a Cross-Sectional Study with Laypeople Seeking Screening Information
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) are used to seek health information. We investigate the prompt-dependent compliance of LLMs with evidence-based health communication guidelines and evaluate the efficacy of a minimal behavioral intervention for boosting laypeople’s prompting. Study 1 systematically varied prompt informedness, topic, and LLMs to evaluate LLM compliance. Study 2 randomized 300 UK participants to interact with LLMs under standard or boosted prompting conditions. Independent blinded raters assessed LLM response with 2 instruments. Study 1 found that LLMs failed evidence-based health communication standards, even with informed prompting. The quality of responses was found to be contingent upon prompt informedness. Study 2 revealed that laypeople frequently generated poor-quality responses; however, a simple boost improved response quality, though it remained below optimal standards. These findings underscore the inadequacy of LLMs as a standalone health communication tool. It is imperative to enhance LLM interfaces, integrate them with evidence-based frameworks, and teach prompt engineering. Study Registration : German Clinical Trials Register (DRKS) (Reg. No.: DRKS00035228) Ethical Approval : Ethics Committee of the University of Potsdam (Approval No. 52/2024)