Evidence of Impact and Interpretational Limits of Generative AI in STEM education - A Systematic Review and Meta-Analysis on Cognitive Learning Outcomes

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This systematic review and meta-analysis examines the impact of generative artificial intelligence (GAI) on cognitive learning outcomes in STEM education. We meta-analyzed externally assessed cognitive outcomes (RQ1) and narratively synthesized reported learner challenges and supportive instructional interventions (RQ2-RQ3) when quantitative pooling was not feasible. Two pairs of raters independently screened and coded peer-reviewed quantitative studies published after 2017 that included a comparison/control group and examined cognitive learning in STEM involving learner-GAI interaction. A systematic search (ERIC, PsycINFO, Web of Science; updated May 7, 2026) and citation tracking yielded 85 eligible studies, of which 49 studies with 59 effect sizes met meta-analytic criteria. A random-effects meta-analysis shows an overall positive effects of GAI in STEM education, but the studies included exhibit a substantial heterogeneity (I2 = 96.32%), and the prediction interval ranges from Hedge’s g = −1.52 to g = 3.20. In line with this, funnel plot asymmetry suggests a potential publication bias. To account for potential publication bias, we also conducted a Robust Bayesian Meta-Analysis (RoBMA) and found that the overall positive effect of GAI in STEM education can be largely attributed to publication bias (µ = 0.076 ± 0.254), however a large heterogeneity remains (τ = 1.190 ± 0.166), which appears to be not associated to publication bias. To explain the heterogeneity, we conducted a moderator analysis and found that the learning outcome (knowledge vs. skills) as well the inversion-substitution-augmentation-redefinition (ISAR)-level , which compares the cognitive activity of the intervention and control in terms of the interactive-constructive-active-passive (ICAP)-level, both explain parts of the observed heterogeneity (R2 = 12.5% and R2 = 13.0%, respectively). In contrast to the main effect, the moderator findings were robust under publication-bias correction using RoBMA. Overall, the RoBMA indicates that GAI appears promising in STEM if knowledge gains are targeted as learning outcomes, and when GAI is used to augment students’ learn-ing activities, instead of substituting their activities, where it can be detrimental. Furthermore, numerous studies (N =33) reported large effects, but only because the cognitive activities of the students in the intervention and control groups were not comparable. Evidence for RQ2-RQ3 was limited and inconsistently reported; hence, these findings are presented as transparent, caveated qualitative insights rather than generalizable effect estimates. The review protocol was preregistered on AsPredicted (ID:176450, URL: https://aspredicted.org/hpbd-dk75.pdf).

Article activity feed