FairSYN-Edu: A Fairness-Aware, Privacy-Preserving Diffusion Model for Educational Data Synthesis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The increasing demand for privacy-preserving, ethically aligned synthetic data generation in education has highlighted the limitations of existing tabular data generators. Traditional approaches often sacrifice fairness or privacy in pursuit of predictive accuracy, rendering them unsuitable for high-stakes academic settings. In this paper, we propose FairSYN-Edu, a novel diffusion-based synthetic data generation framework designed for educational data. By integrating adversarial debiasing and differentially private training into the generative process, FairSYN-Edu jointly optimizes utility, fairness, and privacy. We evaluate our approach on three real-world educational datasets spanning MOOC, K–12 tutoring, and LMS environments. Experimental results demonstrate that FairSYN-Edu achieves significantly lower demographic disparities, maintains competitive predictive performance (RMSE = 0.402), and provides moderate resistance to membership inference attacks (AUC = 0.705). Ablation studies, error gap analysis, and SHAP-based interpretability evaluations confirm the robustness and ethical soundness of our method. We release the full implementation, synthetic benchmark suite, and documentation to foster reproducibility and responsible AI practices in education.

Article activity feed