Guidelines vs Generative AI in CKD Patient Education: The Role of Prompt Engineering and Expert Blinded Evaluation

Lutfullah Zahit Koc
Sevgi Gulsen Koc
Ayca Inci
Osman Cagın Buldukoglu
Gokhan Koker
Edgar V. Lerma

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study aimed to evaluate the accuracy, content quality, and readability of patient education responses related to chronic kidney disease (CKD) generated by large language models (ChatGPT-4o mini and Gemini) compared to clinical guidelines. Fifteen frequently asked CKD-related questions were selected using global Google Trends data and posed to both AI models and guideline-based sources. Responses were anonymized and evaluated by four independent nephrology professors using the CLEAR Tool, assessing completeness, appropriateness, evidence basis, and clarity. Both AI models significantly outperformed guideline responses across all CLEAR Tool domains (p < 0.001), with ChatGPT-4o mini achieving the highest median score (21.0 [IQR: 5.0] vs. Gemini: 17.0 [IQR: 5.0], Guideline: 13.0 [IQR: 2.0]). Initial readability analysis showed that guideline responses were easier to comprehend (FKGL: 9.40; FRE: 52.01) than AI-generated content (ChatGPT FKGL: 11.34, FRE: 36.17; Gemini FKGL: 9.62, FRE: 46.36). However, when a standardized instructional prompt was applied, AI responses demonstrated significant improvements in readability, reducing the required literacy level to approximately the 7th-grade (ChatGPT FKGL: 7.87, FRE: 64.23; Gemini FKGL: 7.13, FRE: 61.45). These findings highlight the potential of prompt-guided AI models to generate accurate, accessible educational content for CKD. Prompt engineering emerges as a practical tool to enhance clarity and usability, particularly for populations with limited health literacy. Integration with frameworks like Retrieval-Augmented Generation may further improve reliability and safety in digital health communication.

Version published to 10.21203/rs.3.rs-7507934/v1 on Research Square
Sep 22, 2025

Closing the Pediatric Divide: A Performance Analysis of the GPT-5 Family in Medical Diagnostics

This article has 6 authors:
1. Gianluca Mondillo
2. Fabio Giovanni Abbate
3. Mariapia Masino
4. Simone Colosimo
5. Alessandra Perrotta
6. Vittoria Frattolillo
This article has no evaluationsLatest version Aug 29, 2025
A Comparative Performance Analysis of AI-Assisted Language Models in Preoperative Patient Education for Mitral Valve Surgery

This article has 10 authors:
1. Banu Bahriye Akdağ
2. Mehmet Şenel Bademci
3. İhsan Peker
4. Okay Güven Karaca
5. Çağrı Kandemir
6. Barçın Özcem
7. Hüseyin Durmaz
8. Meryem Çakır
9. İrem Özçetin
10. Hidayet Onur Selçuk
This article has no evaluationsLatest version Sep 9, 2025
Patient-level AI Collaboration for Precision Medicine in Canada: A Scoping Review

This article has 7 authors:
1. Omid Jafarinezhad
2. Erica Zhang
3. Ryan Rezai
4. Mohammad Noaeen
5. Aviv Shachak
6. Behrouz Far
7. Zahra Shakeri
This article has no evaluationsLatest version Sep 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Closing the Pediatric Divide: A Performance Analysis of the GPT-5 Family in Medical Diagnostics

A Comparative Performance Analysis of AI-Assisted Language Models in Preoperative Patient Education for Mitral Valve Surgery

Patient-level AI Collaboration for Precision Medicine in Canada: A Scoping Review