Synergistic Entropy Coding and Quantization Enable Efficient on Device Neural Networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deploying deep neural networks (DNNs) on edge devices poses significant challenges due to constrained memory, compute, and energy resources. Conventional model compression pipelines—comprising separate stages of pruning, quantization, and entropy coding—often fail to deliver optimal trade-offs between efficiency and accuracy. In this work, we propose Synergistic Entropy–Quantization (SyE-C²Q), a unified compression framework that jointly optimizes quantization precision and entropy coding based on the statistical structure of model parameters. By aligning quantization levels with symbol probabilities and adapting quantization step sizes to entropy estimates, SyE-C²Q reduces the bit-rate while maintaining high inference accuracy.Extensive experiments on MobileNet-V2 and ResNet-18 with CIFAR-10 and ImageNet demonstrate that SyE-C²Q achieves up to 3.6× model compression, < 1% accuracy degradation, and ~ 40% energy savings compared to conventional post-training quantization techniques. Furthermore, the compressed models exhibit improved inference latency and memory utilization on ARM-based edge hardware. Unlike traditional pipelines, SyE-C²Q integrates entropy-guided quantization directly into the compression loop, establishing a new benchmark in the design of resource-efficient, deployable deep learning systems.