Data Pruning in Machine Learning Using Influence Function

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Data pruning is an essential technique in machine learning, enabling efficient dataset management without compromising model accuracy. This study explores the potential of influence functions in designing novel data pruning algorithms. Two methods are introduced: the first combines data selection and dataset distillation by merging similar data points in feature space, while the second removes low-influence redundant examples. The proposed methods aim to reduce computational costs while maintaining generalization performance. Empirical results demonstrate the effectiveness of the second method, achieving significant dataset size reductions across multiple benchmarks, including CIFAR-10, Cats_vs_Dogs, and MNIST, with minimal to no loss in accuracy. This paper advances the mathematical foundations of data pruning and highlights the potential for broader adoption of these techniques in real-world applications.

Article activity feed