Validity and reliability of AI chatbots on the comparative diagnosis and definitive management of deep caries based on position statements evaluated from post-graduate students and clinicians' perspectives

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Aim: This study aimed to evaluate the validity and reliability of prominent AI chatbots—ChatGPT, Perplexity, Claude, and Gemini—in the comparative diagnosis and definitive management of deep caries, guided by global position statements from endodontic organizations, as assessed by post-graduate students and clinicians.​ Methods: Four AI chatbots (ChatGPT, Perplexity, Claude, and Gemini) were accessed through their respective APIs using pro versions. Ten short case histories representing a spectrum of deep caries scenarios, along with corresponding position statements from the European Society of Endodontology, American Association of Endodontists, Indian Endodontic Society and others, were provided to each chatbot. Chatbots were prompted to generate diagnostic and management responses, which were repeated thrice per case per chatbot. Responses were evaluated by two postgraduate students and three senior clinicians using a 5-point Likert scale and an adapted Global Quality Score (GQS) for validity, and Cronbach’s alpha for reliability. Statistical analysis included low- and high-threshold validity tests and intergroup reliability comparisons.​ Conclusion: Perplexity exhibited the highest reliability and validity in deep caries diagnosis and management compared to ChatGPT, Claude, and Gemini. While Perplexity, Claude, and Gemini demonstrated perfect or near-perfect validity at low-threshold criteria, only Perplexity maintained moderate validity at high-stringency levels. Overall variability and reduced descriptive depth across all chatbot outputs highlight current limitations for clinical implementation. AI chatbots may serve as useful educational or adjunctive tools but cannot substitute professional judgment in endodontic diagnosis and treatment. Future development should focus on enhancing performance mechanisms and regulatory oversight to support clinical accuracy and reliability.

Article activity feed