Evaluating the Performance of AI Chatbots in Responding to Dental Implant FAQs: A Comparative Study

Mesut TUZLALI
Nagehan BAKİ
Kübra ARAL
Cüneyt Asım ARAL
Erkan BAHÇE

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background This study aims to evaluate and compare the performance of five publicly accessible large-language-model (LLM) based chatbots—ChatGPT-o1, Deepseek-R1, Google-Gemini-Advanced, Claude-3.5-Sonnet, and Perplexity-Pro—in providing responses to frequently asked questions (FAQs) about dental implant treatment. The primary goal was to assess the accuracy, completeness, clarity, relevance, and consistency of chatbot-generated answers. Methods A total of 45 FAQs commonly encountered in clinical practice and online patient forums regarding dental implants were selected and categorized into nine thematic domains. Each question was submitted to the chatbots individually using a standardized protocol. Responses were independently assessed by a panel of four dental experts and one layperson using a 5-point Likert-scale. Python with Google-Colab was used for statistical analysis. Results ChatGPT-o1 achieved the highest overall performance, particularly in relevance (M = 4.99), consistency (M = 4.97), and accuracy (M = 4.96). Deepseek-R1 followed closely, with strong scores in completeness and relevance. Claude-3.5-Sonnet ranked moderately, while Gemini-Advanced and Perplexity-Pro showed lower performance in completeness and clarity. Significant differences were observed among chatbots across all criteria ( p < 0.001). Inter-rater reliability was high ( α = 0.87), confirming consistency among evaluators. Conclusions AI-driven chatbots demonstrated strong potential in delivering accurate and patient-friendly information about dental implant treatment. However, performance varied considerably across platforms, with ChatGPT-o1 and Deepseek-R1 showing the highest reliability. These findings highlight the emerging role of AI chatbots as supplementary tools in dental education and patient communication, while also underscoring the need for continued validation and ethical oversight in clinical applications.

Version published to 10.21203/rs.3.rs-6762783/v1 on Research Square
Jun 10, 2025

Comparative performance of ChatGPT, Gemini, and DeepSeek on endodontic exam questions in Turkish and English

This article has 1 author:
1. Eda GÜRSU ŞAHİN
This article has no evaluationsLatest version Jun 5, 2025
Efficacy Of Chatgpt-4 in Oral Hygiene Counseling: Evaluation of Information Accuracy and Repeatability

This article has 3 authors:
1. Gözde Erimli
2. Duygu Kılıç
3. Yeşim Ayhan Yıldırım
This article has no evaluationsLatest version Jun 8, 2025
ChatGPT Provides High-Quality Responses to Patient Questions: A Multi-Rater Evaluation by Anesthesiology Experts

This article has 6 authors:
1. Yasemin Akcaalan
2. Ezgi Erkilic
3. Handan Gulec
4. Tulin Gumus
5. Orhan Kanbak
6. Levent Ozturk
This article has no evaluationsLatest version Jun 25, 2025

Listed in

Abstract

Article activity feed

Related articles

Comparative performance of ChatGPT, Gemini, and DeepSeek on endodontic exam questions in Turkish and English

Efficacy Of Chatgpt-4 in Oral Hygiene Counseling: Evaluation of Information Accuracy and Repeatability

ChatGPT Provides High-Quality Responses to Patient Questions: A Multi-Rater Evaluation by Anesthesiology Experts