A Benchmark for Math Misconceptions: Bridging Gaps in Middle School Algebra with AI-Supported Instruction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study introduces an evaluation benchmark for middle school algebra tobe used in artificial intelligence(AI) based educational platforms. The goal is tosupport the design of AI systems that can enhance learners’ conceptual under-standing of algebra by taking into account learners’ current level of algebracomprehension. The dataset comprises of 55 algebra misconceptions, commonerrors and 220 diagnostic examples identified in prior peer-reviewed studies. Weprovide an example application using GPT-4, observing a range of precisionand recall scores depending on the topic and experimental setup reaching 83.9%when including educators’ feedback and restricting it by topic. We found thattopics such as ratios and proportions prove as difficult for GTP-4 as they arefor students. We included a human assessment of GPT-4 results and feedbackfrom five middle school math educators on the clarity and occurrence of the mis-conceptions in the dataset and the potential use of AI in conjunction with thedataset. Most educators (80% or more) indicated that they encounter these mis-conceptions among their students, suggesting the dataset’s relevance to teachingmiddle school algebra. Despite varied familiarity with AI tools, four out of fiveeducators expressed interest in using the dataset with AI to diagnose students’misconceptions or train teachers. The results emphasize the importance of topic-constrained testing, the need for multimodal approaches, and the relevance ofhuman expertise in gaining practical insights when using AI for human learning.