The Convergence of Federated Learning, Knowledge Graphs, and Large Language Models for Language Learning: A Scoping Review
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models (LLMs) in Intelligent Computer-Assisted Language Learning enable highly personalized learning, yet raise significant challenges related to pedagogical grounding, data privacy, and instructional validity. Although Knowledge Graphs (KGs) and Federated Learning (FL) can mitigate these issues in isolation, evidence on systematic FL–KG–LLM integration for educational language learning remains limited. This scoping review maps the FL–KG–LLM convergence landscape. Following PRISMA-ScR guidelines, we searched six databases and screened 51 papers (2019–2025) using automated extraction. Our findings indicate limited convergence: no papers integrate all three domains, and 58.8% of approaches remain confined to isolated technological silos. Reporting is also uneven across the corpus, with an average “Not Reported” (NR) rate of 84.5%, most notably for privacy mechanisms (92.2%), validation metrics (90.2%), and Common European Framework of Reference for Languages (CEFR) alignment (88.2%). Domain-specific analysis reveals two distinct patterns: inter-domain gaps (disciplinary silos resulting in expected CEFR absence in single-domain papers) and intra-domain gaps (failure to report domain-critical variables, including 100% parameter NR in FL studies, 86.7% validation NR in KG studies, and 100% CEFR NR in convergence papers). Taken together, these gaps suggest that pedagogical grounding is treated as optional rather than structural. We therefore identify two pillars of pedagogical grounding: a Grounding Pillar, which constrains LLM outputs via Knowledge Graph rules, and a Validation Pillar, which concerns how authoritative frameworks (e.g., CEFR) are mapped onto Knowledge Graph schemas and evaluated. The near-universal absence of CEFR alignment and validation reporting suggests that this second pillar is currently missing, which we term the Integrity Gap—a systematic disconnection between technological innovation and pedagogical grounding inin Intelligent Computer-Assisted Language Learning. By reframing the problem as upstream control and validation, this review informs the design of user-facing automated systems where trust, transparency, and human oversight are critical.