Comparative Readability of Large Language Models Responses to Male Infertility Questions: Impact of Contextual Prompting

Ramy Abou Ghayda
Hachem Ziadeh
Thriaksh Rajan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly used by patients seeking medical information, yet the readability of LLM-generated content in male infertility remains insufficiently characterized. We evaluated the readability of responses generated by five widely used LLM platforms to frequently asked questions (FAQs) on male infertility collected from urology association and hospital websites. Fifty-four FAQs were submitted to OpenAI (ChatGPT-5/5-mini), Claude (Sonnet 4.5), Google Gemini (2.5 Flash), DeepSeek (V3), and Grok (V3/V4) in two conditions: (1) no additional context and (2) contextual prompting directing the model to explain to a lay patient/couple worried about male infertility. Readability was assessed using Flesch-Kincaid Reading Ease (FRE) and SMOG indices. In the non-prompted condition, DeepSeek generated the most readable responses (mean FRE 46.28±6.86; mean SMOG 11.71±0.76), whereas Claude produced the least readable outputs (mean FRE 23.83±12.29; mean SMOG 16.37±2.47). After prompting, Grok generated the most readable responses (mean FRE 69.71±6.09; mean SMOG 9.96±1.04), and readability improved across all models. These findings suggest that simple contextual prompting can substantially enhance readability of LLM-generated male infertility education; however, readability gains must be paired with ongoing verification of clinical accuracy to mitigate misinformation risk.

Version published to 10.21203/rs.3.rs-8577223/v1 on Research Square
Feb 27, 2026

Evaluating 11 Large Language Models in Answering Key Questions on Ovarian Cancer

This article has 7 authors:
1. Michela Quaranta
2. Yong Sheng Tan
3. Areti Karamanou
4. Evangelos Kalampokis
5. Nicolas M Orsi
6. Diederick DeJong
7. Alexandros Laios
This article has no evaluationsLatest version Apr 11, 2026
Large Language Models as Ophthalmic Patient Educators: A Comparative Evaluation of Readability, Understandability, and Actionability

This article has 3 authors:
1. Shivam Chandra
2. Vineet Kumar
3. Patrianakos Thomas
This article has no evaluationsLatest version Mar 20, 2026
Clinical Safety of Large Language Models in Oral Cancer–Related Patient Communication: A Longitudinal Study

This article has 2 authors:
1. Burcu Yeliz KOLLAYAN
2. Tuğba CEBECİ
This article has no evaluationsLatest version Mar 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evaluating 11 Large Language Models in Answering Key Questions on Ovarian Cancer

Large Language Models as Ophthalmic Patient Educators: A Comparative Evaluation of Readability, Understandability, and Actionability

Clinical Safety of Large Language Models in Oral Cancer–Related Patient Communication: A Longitudinal Study