Impact of Data Distillation on Fairness in Machine Learning Models

Kamil Sabbagh
Hadi Salloum
Rafik Hachana
Marko Pezer
Manuel Mazzara

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Our research examines the implications of dataset distillation—specifically the synthesis of datasets—on fairness in machine learning. Dataset distillation focuses on synthesizing smaller, representative datasets to improve computational efficiency. However, it is critical to explore how this synthesis process impacts the intrinsic biases within the data and subsequently influences the fairness of machine learning models. We conduct a comprehensive analysis to evaluate whether distilling datasets into smaller sizes affect the bias in trained models and how the size of these distilled datasets influences both accuracy and fairness across diverse classes. Our experiments reveal a significant trade-off between accuracy and fairness when applying the state-of-the-art Dataset Distillation by Matching Training Trajectories (MTT) method. Notably, we demonstrate that while increasing the size of the distilled dataset enhances model accuracy, it concurrently increases variance. This research is fundamental for understanding the delicate balance between efficient training and fairness in machine learning. The findings contribute to the ongoing discourse on ethical AI, providing actionable insights for practitioners seeking to optimize dataset size without compromising fairness. Our conclusions underscore the necessity of rigorously considering data distillation methodologies in real-world applications where both efficiency and fairness are of paramount importance.

Version published to 10.20944/preprints202506.2398.v1
Jun 30, 2025

Fairness, Justice, and Social Inequality in Machine Learning

This article has 2 authors:
1. Ruben L. Bach
2. Christoph Kern
This article has no evaluationsLatest version May 28, 2025
From Fair Graphs to Fair Data: A DAG-Based Approach to Mitigating Bias in AI Systems

This article has 3 authors:
1. Vivian Wei Jiang
2. Gustavo Batista
3. Michael Bain
This article has no evaluationsLatest version Jul 3, 2025
FairSYN-Edu: A Fairness-Aware, Privacy-Preserving Diffusion Model for Educational Data Synthesis

This article has 1 author:
1. Kadir Kesgin
This article has no evaluationsLatest version May 28, 2025

Listed in

Abstract

Article activity feed

Related articles

Fairness, Justice, and Social Inequality in Machine Learning

From Fair Graphs to Fair Data: A DAG-Based Approach to Mitigating Bias in AI Systems

FairSYN-Edu: A Fairness-Aware, Privacy-Preserving Diffusion Model for Educational Data Synthesis