Towards Efficient CNN Acceleration: A Review of Sparsity, Compression, and Emerging Hardware Architectures

Jovan Heng
It Zhen Bin
Jeremy Tan
Tee Hui Teo

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rising computational demands of convolutional neural networks (CNNs), especially in edge and real-time systems, have prompted extensive research into energy-efficient and high-throughput hardware accelerators. Recent innovations span across model-level optimizations such as sparsity and compression, as well as circuit-level advancements leveraging FPGAs, ASICs, and beyond-CMOS technologies. This review surveys five representative studies that exemplify state-of-the-art approaches in these domains. We examine sparsity-aware techniques like block pruning and graph convolution-specific dataflows, highlight floating-point feature map compression schemes that reduce off-chip memory access, and explore low-power hardware architectures including spintronic and CNTFET-based binarized neural networks (BNNs). Additionally, we discuss novel data routing mechanisms such as the dual Benes network, which enables flexible and efficient dataflow reorganization. Through comparative analysis, we identify trade-offs in accuracy, hardware cost, and scalability across platforms. Finally, we outline open challenges and propose future directions for integrating these strategies into next-generation CNN accelerators. This paper aims to provide researchers with a cohesive understanding of the rapidly evolving landscape of efficient deep learning hardware design.

Version published to 10.20944/preprints202507.2343.v1
Jul 29, 2025

Reconfigurable Acceleration of Deep Learning Workloads with FPGA-Based Architectures in Edge and Embedded Systems

This article has 5 authors:
1. Lucas Oliveira
2. Camila Ferreira
3. Thiago Souza
4. Gulnaz Rati
5. Mariana Costa
This article has no evaluationsLatest version Jul 15, 2025
High-Performance FPGA Acceleration for Transformer-Based Models

This article has 3 authors:
1. Gulnaz Rati
2. Rafael Costa
3. Lena Ishikawa
This article has no evaluationsLatest version Jul 15, 2025
FPGA-Accelerated Real-Time DCGANs via Xilinx DPUs and Vitis AI

This article has 5 authors:
1. Amirhossein Sadr
2. Shayan Haghighat
3. Aida Pakniyat
4. Dara Rahmati
5. Saeid Gorgin
This article has no evaluationsLatest version Aug 20, 2025

Listed in

Abstract

Article activity feed

Related articles

Reconfigurable Acceleration of Deep Learning Workloads with FPGA-Based Architectures in Edge and Embedded Systems

High-Performance FPGA Acceleration for Transformer-Based Models

FPGA-Accelerated Real-Time DCGANs via Xilinx DPUs and Vitis AI