Guidelines vs Generative AI in CKD Patient Education: The Role of Prompt Engineering and Expert Blinded Evaluation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study aimed to evaluate the accuracy, content quality, and readability of patient education responses related to chronic kidney disease (CKD) generated by large language models (ChatGPT-4o mini and Gemini) compared to clinical guidelines. Fifteen frequently asked CKD-related questions were selected using global Google Trends data and posed to both AI models and guideline-based sources. Responses were anonymized and evaluated by four independent nephrology professors using the CLEAR Tool, assessing completeness, appropriateness, evidence basis, and clarity. Both AI models significantly outperformed guideline responses across all CLEAR Tool domains (p < 0.001), with ChatGPT-4o mini achieving the highest median score (21.0 [IQR: 5.0] vs. Gemini: 17.0 [IQR: 5.0], Guideline: 13.0 [IQR: 2.0]). Initial readability analysis showed that guideline responses were easier to comprehend (FKGL: 9.40; FRE: 52.01) than AI-generated content (ChatGPT FKGL: 11.34, FRE: 36.17; Gemini FKGL: 9.62, FRE: 46.36). However, when a standardized instructional prompt was applied, AI responses demonstrated significant improvements in readability, reducing the required literacy level to approximately the 7th-grade (ChatGPT FKGL: 7.87, FRE: 64.23; Gemini FKGL: 7.13, FRE: 61.45). These findings highlight the potential of prompt-guided AI models to generate accurate, accessible educational content for CKD. Prompt engineering emerges as a practical tool to enhance clarity and usability, particularly for populations with limited health literacy. Integration with frameworks like Retrieval-Augmented Generation may further improve reliability and safety in digital health communication.

Article activity feed