Efficient Model Pruning for Large-Scale Deep Learning Models: Enhancing Performance and Reducing Computational Overhead

Dinesh Kumar Koilada

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep learning models, particularly large-scale language and vision architectures, are computationally intensive due to their extensive number of parameters and complex neural network designs. This paper presents an improved method for model pruning aimed at reducing the computational burden while maintaining performance levels comparable to unpruned models. By analyzing weights, biases, activations, and other key indicators, we propose a novel algorithm that effectively identifies and removes neurons or connections with minimal contribution to the model’s output quality. Our approach achieves a higher pruning efficiency across various pruning ratios, resulting in smaller, faster, and more cost-effective models. Experimental results demonstrate that our method significantly outperforms state-of-the-art (SOTA) pruning techniques in terms of both inference speed and memory usage, with negligible degradation in accuracy. This work contributes to the development of resource-efficient models suitable for deployment in environments with limited computational resources, paving the way for more scalable and sustainable deep-learning applications.

Version published to 10.31224/5216
Sep 3, 2025

PEFT Unlocked: Methodologies, Formulas, and Applications in Efficient LLM Adaptation

This article has 5 authors:
1. Maryam Najafi
2. Ehsan Tavan
3. Stefan Kuhn
4. Salaheddin Alakkari
5. Simon Colreavy
This article has no evaluationsLatest version Oct 15, 2025
Optimizing Deep Learning Architectures forEnhanced Computational Efficiency

This article has 2 authors:
1. Ying Wang
2. Hui Li
This article has no evaluationsLatest version Sep 22, 2025
Learning with Fewer Bits Across Layers and Time in the Training of Foundation-Scale Transformers

This article has 5 authors:
1. Oliver Hartley
2. Priya Desai
3. Nathaniel Brooks
4. Eleanor Hughes
5. Beverley Marion
This article has no evaluationsLatest version Sep 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

PEFT Unlocked: Methodologies, Formulas, and Applications in Efficient LLM Adaptation

Optimizing Deep Learning Architectures forEnhanced Computational Efficiency

Learning with Fewer Bits Across Layers and Time in the Training of Foundation-Scale Transformers