Impact of Data Distillation on Fairness in Machine Learning Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Our research examines the implications of dataset distillation—specifically the synthesis of datasets—on fairness in machine learning. Dataset distillation focuses on synthesizing smaller, representative datasets to improve computational efficiency. However, it is critical to explore how this synthesis process impacts the intrinsic biases within the data and subsequently influences the fairness of machine learning models. We conduct a comprehensive analysis to evaluate whether distilling datasets into smaller sizes affect the bias in trained models and how the size of these distilled datasets influences both accuracy and fairness across diverse classes. Our experiments reveal a significant trade-off between accuracy and fairness when applying the state-of-the-art Dataset Distillation by Matching Training Trajectories (MTT) method. Notably, we demonstrate that while increasing the size of the distilled dataset enhances model accuracy, it concurrently increases variance. This research is fundamental for understanding the delicate balance between efficient training and fairness in machine learning. The findings contribute to the ongoing discourse on ethical AI, providing actionable insights for practitioners seeking to optimize dataset size without compromising fairness. Our conclusions underscore the necessity of rigorously considering data distillation methodologies in real-world applications where both efficiency and fairness are of paramount importance.