A Lightweight Neural Network Compression Pipeline for Resource-Constrained Edge AI Systems
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep neural networks have achieved remarkable performance across a wide range of tasks; however, their deployment on resource-constrained edge devices remains challenging due to high storage and computational requirements. This work proposes a lightweight neural network compression pipeline designed to enable efficient deployment of deep models in constrained environments. The proposed framework integrates knowledge distillation, structured pruning, and dynamic quantization to significantly reduce model size while maintaining competitive predictive performance. Experiments were conducted on the widely used datasets MNIST and Fashion-MNIST to evaluate the effectiveness of the compression strategy. The results demonstrate that the proposed pipeline achieves approximately 20 times model size reduction while preserving classification accuracy with only marginal degradation compared to the original model. These findings indicate that combining multiple lightweight compression techniques can produce compact yet accurate neural models suitable for edge deployment. The proposed approach provides a practical and reproducible framework for developing storage-efficient deep learning systems in resource-limited environments, thereby facilitating broader accessibility and scalability of artificial intelligence applications. These findings indicate that combining multiple lightweight compression techniques can produce compact yet accurate neural models suitable for edge deployment. The proposed approach provides a practical and reproducible framework for developing storage-efficient deep learning systems in resource-limited environments, thereby facilitating broader accessibility and scalability of artificial intelligence applications.