Evaluating Large Language Models for Translating Caries Guidelines into Clinical Decision Support
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective To systematically evaluate the capability of three large language models (LLMs)—ChatGPT-4o, Grok-3, and DeepSeek—in interpreting and translating clinical practice guidelines for caries management and in supporting clinical decision-making, thereby exploring their potential role in disseminating dental knowledge and assisting clinical practice. Methods Based on the American Dental Association’s Evidence-Based Clinical Practice Guideline on Nonrestorative Treatments for Carious Lesions, a zero-shot prompting strategy was used to instruct each model to generate guideline summaries tailored for both healthcare professionals and the general public. Additionally, the models were asked to provide diagnoses and treatment plans for three standardized clinical cases related to caries. Manual evaluations were conducted across five dimensions—accuracy, clarity, conciseness, logical coherence, and overall quality—using a 0–10 scoring system. Text consistency was also assessed using ROUGE-L and BLEU metrics. Results In generating guideline summaries for the general public, GPT-4o achieved the highest overall score, excelling particularly in clarity and logical coherence, while DeepSeek performed best in terminology accuracy and fidelity to the source text. For summaries intended for healthcare professionals, all three models performed well, with DeepSeek leading in automated evaluation metrics. In clinical case management, ChatGPT attained the highest composite score, significantly outperforming Grok-3 and DeepSeek, demonstrating superior diagnostic accuracy and clinically relevant treatment recommendations. Conclusion Large language models show promising potential in translating dental guidelines and assisting clinical decision-making. However, limitations such as insufficient personalization and mechanistic application of guidelines remain. Future efforts should focus on integrating multimodal data, enabling dynamic knowledge updates, and developing human–AI collaborative care models to achieve a balance between standardized and personalized management of oral diseases.