Can Artificial Intelligence Counsel Expectant Mothers? Expert Evaluation of ChatGPT’s Educational Performance in Gestational Diabetes Mellitus

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Gestational diabetes mellitus (GDM) is an increasingly common pregnancy complication associated with adverse maternal–fetal outcomes and long-term risk of type 2 diabetes. Artificial intelligence (AI) tools such as ChatGPT have emerged as potential instruments for patient education, yet their reliability and safety in obstetric counseling remain underexplored. This study aimed to evaluate the educational quality of ChatGPT-5 responses to frequently asked questions about GDM. Methods: This cross-sectional study was conducted between January and June 2025 at İzmir Tepecik Training and Research Hospital, Türkiye, following STROBE guidelines. Twenty patient-derived GDM questions were entered into ChatGPT-5 using a standardized prompt. Thirty obstetrics and gynecology specialists from Türkiye, the United Kingdom, Canada, and Germany independently rated the AI-generated answers on a 5-point Likert scale across five domains: accuracy, comprehensiveness, clarity, safety, and appropriateness. Reliability was analyzed using Cronbach’s α and intraclass correlation coefficient (ICC). Comparative analyses were performed using Friedman and Dunn–Bonferroni tests. Results: Overall, ChatGPT-5 achieved high ratings for accuracy (4.23 ± 0.32), clarity (4.18 ± 0.27), and appropriateness (4.09 ± 0.31), while comprehensiveness (3.92 ± 0.29) and safety (3.88 ± 0.34) scored lower. Internal consistency was strong (Cronbach’s α = 0.87) and interrater reliability was good (ICC = 0.79). Among thematic domains, “Definition, causes, and risk factors” received the highest scores (4.20 ± 0.30), whereas “Follow up and Treatment” scored the lowest (3.96 ± 0.33). Safety ratings were significantly lower than accuracy and clarity (p = 0.018). Conclusion: ChatGPT-5 demonstrated high accuracy and understandability in GDM counseling, suggesting potential use as a physician-supervised educational support tool. However, limitations in comprehensiveness and safety highlight the need for guideline-based refinement and human oversight to ensure ethical and reliable integration of AI in obstetric patient education. Trial registration Not applicable. This study is an observational crosssectional design and did not involve patient enrollment or intervention.

Article activity feed