Chatbots' Performance in Premature Ejaculation Questions: A Comparative Analysis of Reliability, Readability, and Understandability

Serkan Gonultas
Sina Kardas
Mücahit Gelmiş
Abdullah Kinik
Mehmet Ozalevli
Mustafa Gökhan Köse
Suhejb Sulejman
Serhat Yentur
Burak Arslan

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective This study aimed to evaluate the reliability, readability, and understandability of chatbot responses to frequently asked questions about premature ejaculation, and to assess the contributions, potential risks, and limitations of artificial intelligence. Methods Fifteen questions were selected using data from Google Trends and posed to the chatbots Copilot, Gemini, ChatGPT 4o, ChatGPT 4o Plus, and DeepSeek-R1. Reliability was evaluated using the Global Quality Scala (GQS) scale by two experts, readability was assessed with the Flesch Reading Ease (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), and Simple Measure of Gobbledygook (SMOG) scales, and understandability was evaluated using the Patient Educational Materials Assessment Tool for Printable Materials (PEMAT-P) scale. Additionally, the consistency of source citations was examined. Results The GQS scores were as follows: Copilot: 3.96 ± 0.66, Gemini: 3.66 ± 0.78, ChatGPT 4o: 4.83 ± 0.23, ChatGPT 4o Plus: 4.83 ± 0.29, DeepSeek: 4.86 ± 0.22. The PEMAT-P scores were: Copilot: 0.70 ± 0.05, Gemini: 0.72 ± 0.04, ChatGPT 4o: 0.83 ± 0.03, ChatGPT 4o Plus: 0.77 ± 0.06, DeepSeek: 0.79 ± 0.06. While ChatGPT and DeepSeek scored higher for reliability and understandability, all chatbots performed at an acceptable level. However, readability scores were above the recommended level for the target audience. Instances of low reliability or unverified sources were noted, with no significant differences between the chatbots. Conclusion Chatbots provide highly reliable and informative responses regarding premature ejaculation; however, it is evident that there are significant limitations that require improvement, particularly concerning readability and the reliability of sources.

Version published to 10.21203/rs.3.rs-6131594/v1 on Research Square
Mar 20, 2025

Simplifying cardiology research abstracts: assessing ChatGPT’s readability and comprehensibility for non-medical audiences

This article has 10 authors:
1. Kabir Malkani
2. Zachary Falk
3. Ruina Zhang
4. Ryan Hughes
5. Prianca Tawde
6. Melissa Parker
7. Griffin P. Collins
8. Danielle Maizes
9. Alexander Zhao
10. Vinay Kini
This article has no evaluationsLatest version Mar 23, 2025
Comparative accuracy of ChatGPT-o1, DeepSeek R1, and Gemini 2.0 in answering general primary care questions

This article has 5 authors:
1. Guerino Recinella
2. Chiara Altini
3. Marco Cupardo
4. Iacopo Cricelli
5. Lorenzo Maestri
This article has no evaluationsLatest version Apr 19, 2025
Patterns, Advances, and Gaps in Using ChatGPT and Similar Technologies in Nursing Education: A PAGER Scoping Review

This article has 8 authors:
1. MS Isaac Amankwaa; PhD
2. Emmanuel Ekpor
3. Daniel Cudjoe
4. Emmanuel Kobiah
5. Abdul-Karim Jebuni Fuseini
6. Maximous Diebieri
7. Sabastin Gyamfi
8. Sharon Brownie
This article has no evaluationsLatest version Apr 22, 2025

Listed in

Abstract

Article activity feed

Related articles

Simplifying cardiology research abstracts: assessing ChatGPT’s readability and comprehensibility for non-medical audiences

Comparative accuracy of ChatGPT-o1, DeepSeek R1, and Gemini 2.0 in answering general primary care questions

Patterns, Advances, and Gaps in Using ChatGPT and Similar Technologies in Nursing Education: A PAGER Scoping Review