Evaluating the Quality of AI-Generated Corrective Feedback for Medical Learners

Bonnie Desselle
Emma Simon
Amy Prudhomme
Christy Mumphrey
Leslie Reilly
Margaret Huntwork
Shubho Sarkar
George Hescock
Jessica Patrick
Kelly Gajewski
Amy Creel

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Faculty feedback is vital for the professional growth of learners. Barriers such as faculty’s confidence in their feedback skills and time limitations may inhibit the provision of quality feedback. Generative large language models (LLMs), such as ChatGPT, have the potential to support faculty in providing high-quality corrective feedback by producing effective narrative feedback scripts for faculty to reference. Objective This study explores the quality of Artificial Intelligence (AI)-generated text in the context of corrective feedback, based on standardized scenarios of learner deficiencies. Methods Six medical education leaders blindly rated two narrative feedback scripts for eighteen learner deficiency scenarios; one script for each scenario was authored by educators, the other script was generated by ChatGPT. The raters scored each script in four domains (behavior-centric, balanced between positive and negative, non-judgmental, and focused) using a 3-point scale (0–2), where 0 = No, 1 = Somewhat, 2 = Yes. The mean scores of the educator authored scripts and the ChatGPT-generated scripts were compared by domain. An overall score was also generated for the educator authored script and the ChatGPT-generated scripts using descriptive statistics. Results Across all four domains, ChatGPT-generated scripts received high mean scores, with all rated higher than the educator authored scripts. Also, ChatGPT-generated responses achieved higher overall scores than those authored by educators for 17 of 18 scenarios. Conclusion AI-generated responses can be of high quality and may provide a useful tool for faculty who are preparing to give corrective feedback. Further studies are needed to explore the utility of AI in supporting feedback delivery.

Version published to 10.21203/rs.3.rs-9076837/v1 on Research Square
Apr 17, 2026

A Program Evaluation for Teaching AI Prompt Engineering for Evidence-Based Medicine to Fourth Year Medical Students

This article has 2 authors:
1. Erik Langenau
2. Hsinliang Chen
This article has no evaluationsLatest version Mar 27, 2026
Psychometric Performance and Student Perceptions of AI- versus Human-Generated Multiple-Choice Questions: The AHEAD Randomized Controlled Trial

This article has 14 authors:
1. Dheyaa Al-Najafi
2. Katherine D. Krause
3. Yundi Wang
4. Qi Kang Zuo
5. Maya Koblanski
6. Cameron J. Leong
7. Emma Schmidt
8. Muhammad Faran
9. Vanay Verma
10. Ravi Vyas
11. Matthew Campbell
12. Jaehyun Hwang
13. Jiawen Deng
14. Anita Palepu
This article has no evaluationsLatest version Apr 3, 2026
From Feedback to Performance: Structured Reflection Enhances Skill Development and Reflective Thinking in Medical Students

This article has 6 authors:
1. Marzieh Naghavi Ravandi
2. Fakhrosadat Mirhosseini
3. Maryam Alizadeh
4. John Sanders
5. Seyed Gholamabbas Mousavi
6. Seyed Alireza Sajadifar
This article has no evaluationsLatest version Mar 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Program Evaluation for Teaching AI Prompt Engineering for Evidence-Based Medicine to Fourth Year Medical Students

Psychometric Performance and Student Perceptions of AI- versus Human-Generated Multiple-Choice Questions: The AHEAD Randomized Controlled Trial

From Feedback to Performance: Structured Reflection Enhances Skill Development and Reflective Thinking in Medical Students