Evaluating the Quality of AI-Generated Corrective Feedback for Medical Learners
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Faculty feedback is vital for the professional growth of learners. Barriers such as faculty’s confidence in their feedback skills and time limitations may inhibit the provision of quality feedback. Generative large language models (LLMs), such as ChatGPT, have the potential to support faculty in providing high-quality corrective feedback by producing effective narrative feedback scripts for faculty to reference. Objective This study explores the quality of Artificial Intelligence (AI)-generated text in the context of corrective feedback, based on standardized scenarios of learner deficiencies. Methods Six medical education leaders blindly rated two narrative feedback scripts for eighteen learner deficiency scenarios; one script for each scenario was authored by educators, the other script was generated by ChatGPT. The raters scored each script in four domains (behavior-centric, balanced between positive and negative, non-judgmental, and focused) using a 3-point scale (0–2), where 0 = No, 1 = Somewhat, 2 = Yes. The mean scores of the educator authored scripts and the ChatGPT-generated scripts were compared by domain. An overall score was also generated for the educator authored script and the ChatGPT-generated scripts using descriptive statistics. Results Across all four domains, ChatGPT-generated scripts received high mean scores, with all rated higher than the educator authored scripts. Also, ChatGPT-generated responses achieved higher overall scores than those authored by educators for 17 of 18 scenarios. Conclusion AI-generated responses can be of high quality and may provide a useful tool for faculty who are preparing to give corrective feedback. Further studies are needed to explore the utility of AI in supporting feedback delivery.