Research Syntheses on AI in Education Offer Limited Educational Insights

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Artificial intelligence (AI) is increasingly used in education. As research in this area grows rapidly, so does the number of meta-analyses. In this commentary, we argue that claims in recent meta-analyses are sometimes overstated and risk misrepresenting the strength of evidence. This issue is not limited to the meta-analyses themselves but indicates broader problems in the field. We focus on a key issue that, in our view, significantly limits the current understanding of the effects of AI in education: the choice of outcome measures. Drawing on educational intervention research, we examine five often-overlooked aspects of performance outcomes in AI in education: (a) standardized versus self-developed tests, (b) broad versus narrow outcome measures, (c) over-alignment between intervention and assessment, (d) lack of real-life educational outcomes, and (e) AI-specific challenges, such as hybrid human-AI performance measurement. To illustrate the scope of these concerns, we coded outcome measures from studies included in two recent meta-analyses: the most comprehensive to date on chatbots and on learning with ChatGPT, respectively. We found, for example, that only 12% of studies used standardized achievement tests, and 10% assessed real-life educational outcomes (e.g., course grades.) These design features limit the applied relevance of the field and reduce the usefulness of research syntheses for informing educational theory and practice. We suggest that insights from educational intervention research can help address these measurement issues. Our commentary offers actionable recommendations, including moderating meta-analyses by outcome type and prioritizing distal outcomes, to strengthen the empirical foundation of AI in education research.

Article activity feed