Can Artificial Intelligence Counsel Expectant Mothers? Expert Evaluation of ChatGPT’s Educational Performance in Gestational Diabetes Mellitus

Mücahit Furkan BALCI
Celal AKDEMİR
Fatih YILDIRIM

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Gestational diabetes mellitus (GDM) is an increasingly common pregnancy complication associated with adverse maternal–fetal outcomes and long-term risk of type 2 diabetes. Artificial intelligence (AI) tools such as ChatGPT have emerged as potential instruments for patient education, yet their reliability and safety in obstetric counseling remain underexplored. This study aimed to evaluate the educational quality of ChatGPT-5 responses to frequently asked questions about GDM. Methods: This cross-sectional study was conducted between January and June 2025 at İzmir Tepecik Training and Research Hospital, Türkiye, following STROBE guidelines. Twenty patient-derived GDM questions were entered into ChatGPT-5 using a standardized prompt. Thirty obstetrics and gynecology specialists from Türkiye, the United Kingdom, Canada, and Germany independently rated the AI-generated answers on a 5-point Likert scale across five domains: accuracy, comprehensiveness, clarity, safety, and appropriateness. Reliability was analyzed using Cronbach’s α and intraclass correlation coefficient (ICC). Comparative analyses were performed using Friedman and Dunn–Bonferroni tests. Results: Overall, ChatGPT-5 achieved high ratings for accuracy (4.23 ± 0.32), clarity (4.18 ± 0.27), and appropriateness (4.09 ± 0.31), while comprehensiveness (3.92 ± 0.29) and safety (3.88 ± 0.34) scored lower. Internal consistency was strong (Cronbach’s α = 0.87) and interrater reliability was good (ICC = 0.79). Among thematic domains, “Definition, causes, and risk factors” received the highest scores (4.20 ± 0.30), whereas “Follow up and Treatment” scored the lowest (3.96 ± 0.33). Safety ratings were significantly lower than accuracy and clarity (p = 0.018). Conclusion: ChatGPT-5 demonstrated high accuracy and understandability in GDM counseling, suggesting potential use as a physician-supervised educational support tool. However, limitations in comprehensiveness and safety highlight the need for guideline-based refinement and human oversight to ensure ethical and reliable integration of AI in obstetric patient education. Trial registration Not applicable. This study is an observational crosssectional design and did not involve patient enrollment or intervention.

Version published to 10.21203/rs.3.rs-7983164/v1 on Research Square
Dec 3, 2025

Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics

This article has 2 authors:
1. Mert Zure
2. Metin Sökmen
This article has no evaluationsLatest version Jan 21, 2026
The power of machine learning models in predicting gestational diabetes mellitus

This article has 7 authors:
1. Vahid Mehrnoush
2. Ali Haghighat
3. Anna Nami
4. Nazanin Rezaei
5. Fatemeh Darsareh
6. Farideh Montazeri
7. Mozhgan Saffari
This article has no evaluationsLatest version Dec 17, 2025
Performance Evaluation of Large Language Models in Real-World Perinatal Medication Consultations: A Cross-Sectional Study

This article has 4 authors:
1. RAN WANG
2. Yifan Li
3. Xuewei Feng
4. Xin Feng
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Artificial Intelligence in Clinical Practice: Evaluating Chatbot Performance on Board-Level Questions in Geriatrics

The power of machine learning models in predicting gestational diabetes mellitus

Performance Evaluation of Large Language Models in Real-World Perinatal Medication Consultations: A Cross-Sectional Study