Does the Use of Generative AI Undermine Learning: A Randomized Controlled Trial
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose: This study quantitatively evaluates the impact of using a large language model on student learning outcomes.Design: A 1:1 parallel group randomized controlled trial.Setting: Classroom experiments conducted at Komazawa University, Tokyo, Japan.Intervention: The study comprised two experiments. In both, participants in the intervention group were permitted to use a large language model (Google Bard or Google Gemini) during a learning activity on a specific topic, while those in the control group were prohibited from using LLMs. Fourteen days later, all participants completed tasks on the learned topic without access to LLMs. Experiment 1, focused on "Essay Writing," was conducted in December 2023. Experiment 2, focused on "Reference Creation," was conducted in July 2024.Participants: Experiment 1 involved 50 undergraduate students and Experiment 2 involved 65 undergraduate students.Main Outcome Measures: Primary outcomes were the scores on tasks completed during the initial learning phase and 14 days post-learning. Secondary outcomes included time spent on task input.Results:Essay Writing (Experiment 1): Fourteen days post-learning, on an essay writing task completed without LLM access, the intervention group's mean score was 6.50, and the control group's mean score was 6.23. The mean difference was -0.27 (95% confidence interval (CI) -1.6 to 1.1). The effect size, as measured by Cohen’s d, is d=0.053, indicating a very small effect. These results do not support the conclusion that LLM use hinders learning in this context.Reference Creation (Experiment 2): Fourteen days post-learning, on a reference creation task completed without LLM access, the intervention group's mean score was 5.98, and the control group's mean score was 7.54. The mean difference was 1.56 (95% CI 0.98 to 2.14). The effect size, as measured by Cohen’s d, is d=-1.301, indicating a large effect. These results suggest that LLM use hindered learning in this context.Conclusions: Experiment 1 indicates that LLMs do not impede learning, whereas Experiment 2 reveals that LLMs hinder learning. The difference in results may stem from the absence of strict rules in essay writing (Experiment 1) compared to the presence of such rules in reference generation (Experiment 2). When strict rules are present, increasing the burden of task execution, participants permitted to use LLMs (Experiment 1) may have engaged in cognitive offloading. While acknowledging several limitations, this study employs a rigorous randomized controlled trial (RCT) methodology to investigate the impact of generative AI use on learning, specifically examining effects 14 days post-intervention, thereby offering significant implications for higher education.Pre-registration: https://osf.io/xgzwdProject page: https://osf.io/jsa5n/