A Lightweight Neural Network Compression Pipeline for Resource-Constrained Edge AI Systems

SOM SUBHRO NATH

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep neural networks have achieved remarkable performance across a wide range of tasks; however, their deployment on resource-constrained edge devices remains challenging due to high storage and computational requirements. This work proposes a lightweight neural network compression pipeline designed to enable efficient deployment of deep models in constrained environments. The proposed framework integrates knowledge distillation, structured pruning, and dynamic quantization to significantly reduce model size while maintaining competitive predictive performance. Experiments were conducted on the widely used datasets MNIST and Fashion-MNIST to evaluate the effectiveness of the compression strategy. The results demonstrate that the proposed pipeline achieves approximately 20 times model size reduction while preserving classification accuracy with only marginal degradation compared to the original model. These findings indicate that combining multiple lightweight compression techniques can produce compact yet accurate neural models suitable for edge deployment. The proposed approach provides a practical and reproducible framework for developing storage-efficient deep learning systems in resource-limited environments, thereby facilitating broader accessibility and scalability of artificial intelligence applications. These findings indicate that combining multiple lightweight compression techniques can produce compact yet accurate neural models suitable for edge deployment. The proposed approach provides a practical and reproducible framework for developing storage-efficient deep learning systems in resource-limited environments, thereby facilitating broader accessibility and scalability of artificial intelligence applications.

Version published to 10.21203/rs.3.rs-9295528/v1 on Research Square
Apr 2, 2026

The Energy Efficiency Paradox: Lightweight CNNs Consume More Power than ResNets on Consumer GPUs

This article has 1 author:
1. Someyo Kamal Utsho
This article has no evaluationsLatest version Apr 14, 2026
Fisher-Aware Adaptive Mixed-Precision Ternary Hybrid Quantization

This article has 1 author:
1. vibhor joshi
This article has no evaluationsLatest version Apr 14, 2026
Sparse Projection Attention: A Computationally Efficient Framework for Long Sequence Modeling

This article has 3 authors:
1. Mehdi Chrifi Alaoui
2. Nour-eddine Joudar
3. Mohamed Ettaouil
This article has no evaluationsLatest version Apr 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Energy Efficiency Paradox: Lightweight CNNs Consume More Power than ResNets on Consumer GPUs

Fisher-Aware Adaptive Mixed-Precision Ternary Hybrid Quantization

Sparse Projection Attention: A Computationally Efficient Framework for Long Sequence Modeling