Comparison of the accuracy and reliability of ChatGPT-4o and Gemini in answering HIV-related questions

Muhammet Salih Tarhan
Meryem Sahin Ozdemir

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Bacground: This is the first study to evaluate the accuracy and reliability of the ChatGPT and Gemini chatbots' on HIV. Methods A total of 156 questions about HIV in 3 different categories (CDC, guideline and social media) were asked to both ChatGPT and Gemini. The chatbots' answers were scored on a scale of 1 to 4 (1 = completely wrong, 4 = completely correct) by two different infectious disease experts. The reproducibility of both chatbots was also analysed. Results The mean score of the answers generated for all questions was 3.69 ± 0.72 for ChatGPT and 3.55 ± 0.81 for Gemini (p = 0.051). The rate of completely correct answers was 81.4% for ChatGPT and 71.8% for Gemini (p = 0.045). ChatGPT answered guideline questions with lower accuracy than CDC questions (47.9% vs. 97.1%, p = 0.000) and social media questions (47.9% vs. 94.9%, p = 0.000). Similarly, Gemini answered guideline questions with lower accuracy than CDC questions (35.4% vs. 88.4%, p = 0.000) and social media questions (35.4% vs. 87.2%, p = 0.000). Considering the questions according to the topics, the lowest accuracy rate for both chatbots was in the subject of ‘Prevention and Treatment’ (67.2% for ChatGPT, 54.7% for Gemini). The reproducibility of the answers was 94.8% for ChatGPT and 90.3% for Gemini. Conclusion ChatGPT and Gemini answered CDC and the social media questions with high accuracy. However, both chatbots need improvement for guideline questions and questions on “Prevention and Treatment”. Therefore, these applications need to be improved for the use of healthcare professionals.

Version published to 10.21203/rs.3.rs-7541042/v1 on Research Square
Sep 8, 2025

Close, But no Cigar: Comparative Evaluation of ChatGPT-4o and OpenAI o1-preview in Answering Pancreatic Ductal Adenocarcinoma-Related Questions

This article has 9 authors:
1. Cheng-Peng Li
2. Yuan Chu
3. Dao-Ning Liu
4. Erfan Ghanad
5. Schaima Abdelhadi
6. Flavius Șandra-Petrescu
7. Christoph Reißfelder
8. Georgi Vassilev
9. Cui Yang
This article has no evaluationsLatest version Jul 31, 2025
Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening

This article has 8 authors:
1. Zhijun Guo
2. Alvina Lai
3. Julia Ive
4. Alexandru Petcu Petcu
5. Yutong Wang
6. Luyuan Qi
7. Johan H Thygesen
8. Kezhi Li
This article has no evaluationsLatest version Aug 7, 2025
ChatGPT as a Digital Pharmacist: A Systematic Review and Meta-Analysis of Drug-Counselling Accuracy

This article has 13 authors:
1. Helia Azmakan
2. Ali Nabipour
3. Niloufar Ghorabi Tehrani
4. Niloofar Najari
5. Pardis Fathi Hafshjani
6. Alireza Falahati Marvast
7. Sheida Mani
8. Negin Asemi Sichani
9. Samin Fallah Pakdaman
10. Mobina Shieh
11. Zeinab Afrandkhalilabad
12. Arad Shadi
13. Ramin Shahidi
This article has no evaluationsLatest version Aug 12, 2025

Listed in

Abstract

Article activity feed

Related articles

Close, But no Cigar: Comparative Evaluation of ChatGPT-4o and OpenAI o1-preview in Answering Pancreatic Ductal Adenocarcinoma-Related Questions

Development and Evaluation of HopeBot: an LLM-based chatbot for structured and interactive PHQ-9 depression screening

ChatGPT as a Digital Pharmacist: A Systematic Review and Meta-Analysis of Drug-Counselling Accuracy