Data Pruning in Machine Learning Using Influence Function

Mohammad Mahdi Danesh Pajouh

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Data pruning is an essential technique in machine learning, enabling efficient dataset management without compromising model accuracy. This study explores the potential of influence functions in designing novel data pruning algorithms. Two methods are introduced: the first combines data selection and dataset distillation by merging similar data points in feature space, while the second removes low-influence redundant examples. The proposed methods aim to reduce computational costs while maintaining generalization performance. Empirical results demonstrate the effectiveness of the second method, achieving significant dataset size reductions across multiple benchmarks, including CIFAR-10, Cats_vs_Dogs, and MNIST, with minimal to no loss in accuracy. This paper advances the mathematical foundations of data pruning and highlights the potential for broader adoption of these techniques in real-world applications.

Version published to 10.31219/osf.io/4sm2r on OSF Preprints
Jan 10, 2025

Novel binning-based methods for model fitting and data splitting improved machine learning imbalanced data

This article has 2 authors:
1. Husam Abdulnabi
2. J. Timothy Westwood
This article has no evaluationsLatest version Jul 1, 2025
Interval Regression: A Comparative Study with Proposed Models

This article has 2 authors:
1. Tung L Nguyen
2. Toby Dylan Hocking
This article has no evaluationsLatest version Jun 19, 2025
Introducing DART: A Novel Deep Adaptive Upsampling Technique for Handling Class Imbalance

This article has 1 author:
1. Mark Lokanan
This article has no evaluationsLatest version Jun 18, 2025

Listed in

Abstract

Article activity feed

Related articles

Novel binning-based methods for model fitting and data splitting improved machine learning imbalanced data

Interval Regression: A Comparative Study with Proposed Models

Introducing DART: A Novel Deep Adaptive Upsampling Technique for Handling Class Imbalance