Towards Efficient CNN Acceleration: A Review of Sparsity, Compression, and Emerging Hardware Architectures
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rising computational demands of convolutional neural networks (CNNs), especially in edge and real-time systems, have prompted extensive research into energy-efficient and high-throughput hardware accelerators. Recent innovations span across model-level optimizations such as sparsity and compression, as well as circuit-level advancements leveraging FPGAs, ASICs, and beyond-CMOS technologies. This review surveys five representative studies that exemplify state-of-the-art approaches in these domains. We examine sparsity-aware techniques like block pruning and graph convolution-specific dataflows, highlight floating-point feature map compression schemes that reduce off-chip memory access, and explore low-power hardware architectures including spintronic and CNTFET-based binarized neural networks (BNNs). Additionally, we discuss novel data routing mechanisms such as the dual Benes network, which enables flexible and efficient dataflow reorganization. Through comparative analysis, we identify trade-offs in accuracy, hardware cost, and scalability across platforms. Finally, we outline open challenges and propose future directions for integrating these strategies into next-generation CNN accelerators. This paper aims to provide researchers with a cohesive understanding of the rapidly evolving landscape of efficient deep learning hardware design.