Responses of AI Chatbots to Testosterone Replacement Therapy: Patient's Beware!

Herleen Pabla
Alyssa Lange
Nagalakshmi Nadiminty
Puneet Sindhwani

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose Using Chatbots to seek healthcare information is becoming more popular. Misinformation and gaps in knowledge exist regarding the risks and benefits of Testosterone Replacement Therapy (TRT). We aimed to assess and compare the quality and readability of responses generated by four AI chatbots. Materials and Methods ChatGPT, Google Bard, Bing Chat, and Perplexity AI were asked the same eleven questions regarding Testosterone Replacement Therapy. The responses were evaluated by four reviewers using DISCERN and Patient Education Materials Assessment Tool (PEMAT) questionnaires. Readability was assessed using the Readability Scoring system v2.0 to calculate the Flesch-Kincaid Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Kruskal-Wallis statistics were completed using GraphPad Prism V10.1.0. Results Google Bard received the highest DISCERN and PEMAT. Perplexity AI received the highest FRES and best FKGL. Significant differences were found in understandability between Bing and Google Bard, DISCERN scores between Bing and Google Bard, FRES between ChatGPT and Perplexity, and FKGL scoring between ChatGPT and Perplexity AI. Conclusion ChatGPT and Google Bard were top performers based on their quality, understandability, and actionability. Despite Perplexity scoring higher in readability, the generated text still maintained an eleventh-grade complexity. Perplexity stood out for its extensive use of citations, however, it offered repetitive answers despite the diversity of questions posed to it. Google Bard demonstrated a high level of detail in its answers, offering additional value through visual aids.

Version published to 10.20944/preprints202410.0152.v1
Oct 2, 2024

Comparison of the accuracy and reliability of ChatGPT-4o and Gemini in answering HIV-related questions

This article has 2 authors:
1. Muhammet Salih Tarhan
2. Meryem Sahin Ozdemir
This article has no evaluationsLatest version Sep 8, 2025
One Step Closer to Conversational Medical Records: ChatGPT Parses Psoriasis Treatments from EMRs

This article has 8 authors:
1. Jonathan Shapiro
2. Mor Atlas
3. Sharon Baum
4. Felix Pavlotsky
5. Aviv Barzilai
6. Rotem Gershon
7. Romi Gleicher
8. Itay Cohen
This article has no evaluationsLatest version Sep 29, 2025
ChatGPT Applications in Heart Failure: Patient Education, Readability Enhancement, and Clinical Utility

This article has 10 authors:
1. Robert S. Doyle
2. Jack Hartnett
3. Hugo C. Temperley
4. Cian Murray
5. Ross Walsh
6. Jamie Walsh
7. John McCormick
8. Catherine McGorrian
9. Katie Murphy
10. Kenneth McDonald
This article has no evaluationsLatest version Oct 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Comparison of the accuracy and reliability of ChatGPT-4o and Gemini in answering HIV-related questions

One Step Closer to Conversational Medical Records: ChatGPT Parses Psoriasis Treatments from EMRs

ChatGPT Applications in Heart Failure: Patient Education, Readability Enhancement, and Clinical Utility