Expert-Based Evaluation of ChatGPT for Removable Partial Denture Design: Accuracy and Reliability Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective This study evaluated the accuracy and reproducibility of ChatGPT-5 in designing removable partial dentures (RPDs) for partially edentulous arches and investigated whether the inclusion of Kennedy classification improves the clinical validity of the generated treatment plans. Materials and Methods Twenty standardized partially edentulous scenarios (10 maxillary and 10 mandibular) were presented to ChatGPT-5 under two prompting conditions: (1) describing only the dental chart (tooth configuration), and (2) including the specific Kennedy classification. To assess reproducibility, prompts were submitted across three separate sessions (N = 240 responses). Two experienced prosthodontists evaluated the outputs (major connector and direct/indirect retainers) using a 3-point Likert scale. Inter-rater reliability was calculated using Gwet’s AC1 and percent agreement. Results A total of 240 evaluations were analyzed by averaging the scores of two prosthodontic experts across all case scenarios. Intra-rater reliability demonstrated moderate to substantial agreement for both experts, with Gwet’s AC1 values ranging from 0.637 to 0.733 and percent agreement between 0.733 and 0.800. Without Kennedy classification, correct response rates in the mandible ranged from 41.7% to 60.0%, whereas higher accuracy was observed in the maxilla (45.0%–76.7%). With Kennedy classification, mandibular accuracy increased across all components (major connectors: 55.0%; direct retainers: 76.7%; indirect retainers: 66.7%), while maxillary accuracy remained stable or decreased (50.0%–68.3%). Conclusions While ChatGPT shows promise as a supportive educational tool, its performance is highly sensitive to prompt engineering and anatomical context. The inclusion of Kennedy classification improves precision in mandibular cases but may introduce conflicting constraints in maxillary planning. Therefore, AI-generated plans currently require strict expert validation before clinical application.