How easily can AI chatbots spread misinformation in audiology and otolaryngology?

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Chatbots powered by large language models (LLMs) have recently emerged as prominent sources of information. However, their ability to propagate misinformation as well as information, particularly in specialized fields like audiology and otolaryngology, remains underexplored. This study aimed to evaluate the accuracy of six popular chatbots – ChatGPT, Gemini, Claude, DeepSeek, Grok, and Mistral – in response to questions framed around a range of unproven methods in audiological and otolaryngological care.

Methods

A set of 50 questions was developed based on common conversations between patients and clinicians. We then posed these questions to the chatbots. We tested each chatbot 10 times to account for variable responses, producing a total of 3,000 responses. The responses were compared with correct answers based on the general opinion of 11 professionals. The consistency of the responses was evaluated by Cohen’s Kappa.

Results

Most chatbot responses to the majority of questions were deemed accurate. Grok consistently performed best, where its answers aligned perfectly with the opinions of the experts. Deepseek exhibited the lowest accuracy, scoring 95.8%. Mistral exhibited the lowest consistency, scoring 0.96.

Conclusions

Although the evaluated chatbots generally avoided endorsing scientifically unsupported methods, some of the answers given could mislead and facilitate misinformation. The best performer among the group was Grok, which provided consistently accurate responses, showing it has potential for use in clinical and educational settings.

Article activity feed