Comparison of the accuracy and reliability of ChatGPT-4o and Gemini in answering HIV-related questions
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bacground: This is the first study to evaluate the accuracy and reliability of the ChatGPT and Gemini chatbots' on HIV. Methods A total of 156 questions about HIV in 3 different categories (CDC, guideline and social media) were asked to both ChatGPT and Gemini. The chatbots' answers were scored on a scale of 1 to 4 (1 = completely wrong, 4 = completely correct) by two different infectious disease experts. The reproducibility of both chatbots was also analysed. Results The mean score of the answers generated for all questions was 3.69 ± 0.72 for ChatGPT and 3.55 ± 0.81 for Gemini (p = 0.051). The rate of completely correct answers was 81.4% for ChatGPT and 71.8% for Gemini (p = 0.045). ChatGPT answered guideline questions with lower accuracy than CDC questions (47.9% vs. 97.1%, p = 0.000) and social media questions (47.9% vs. 94.9%, p = 0.000). Similarly, Gemini answered guideline questions with lower accuracy than CDC questions (35.4% vs. 88.4%, p = 0.000) and social media questions (35.4% vs. 87.2%, p = 0.000). Considering the questions according to the topics, the lowest accuracy rate for both chatbots was in the subject of ‘Prevention and Treatment’ (67.2% for ChatGPT, 54.7% for Gemini). The reproducibility of the answers was 94.8% for ChatGPT and 90.3% for Gemini. Conclusion ChatGPT and Gemini answered CDC and the social media questions with high accuracy. However, both chatbots need improvement for guideline questions and questions on “Prevention and Treatment”. Therefore, these applications need to be improved for the use of healthcare professionals.