Evaluating evidence-based health information from generative AI using a cross-sectional study with laypeople seeking screening information
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) are used to seek health information. Guidelines for evidence-based health communication require the presentation of the best available evidence to support informed decision-making. We investigate the prompt-dependent guideline compliance of LLMs and evaluate a minimal behavioural intervention for boosting laypeople’s prompting. Study 1 systematically varied prompt informedness, topic, and LLMs to evaluate compliance. Study 2 randomized 300 participants to three LLMs under standard or boosted prompting conditions. Blinded raters assessed LLM response with two instruments. Study 1 found that LLMs failed evidence-based health communication standards. The quality of responses was found to be contingent upon prompt informedness. Study 2 revealed that laypeople frequently generated poor-quality responses. The simple boost improved response quality, though it remained below required standards. These findings underscore the inadequacy of LLMs as a standalone health communication tool. Integrating LLMs with evidence-based frameworks, enhancing their reasoning and interfaces, and teaching prompting are essential. Study Registration: German Clinical Trials Register (DRKS) (Reg. No.: DRKS00035228, registered on 15 October 2024).