FPGA Implementation of a Deep Convolutional Neural Network Hardware Accelerator

Ximei Huangfu
Yu jiang
Junfeng Dai

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper proposes an efficient deep convolutional neural network (CNN) accelerator designed for FPGA platforms, with a focus on optimizing data interaction bottlenecks and computational resource utilization. A dynamic block-caching strategy is employed to dynamically allocate on-chip BRAM and FIFO resources, reducing off-chip DDR access frequency by up to 71%. Additionally, 8-bit fixed-point quantization is applied to compress the weight storage to 5.73 MB, while maintaining model accuracy with only a 3.8% drop in mean average precision (mAP). The computation module features a configurable adder tree and a pipelined design, supporting adaptive switching between 3×3 and 1×1 convolution kernels. The peak throughput reaches 289 GOPS, with MAC utilization achieving a maximum of 98.10%. Experimental results demonstrate that the optimized system achieves a threefold increase in inference speed at a 250 MHz clock frequency, with power consumption of only 4.803 W and an energy efficiency ratio (0.625 fps/W) that outperforms existing solutions by a factor of 3.38. This paper provides an efficient hardware solution for real-time object detection in resource-constrained scenarios.

Version published to 10.21203/rs.3.rs-6268406/v1 on Research Square
Apr 21, 2025

Extending a Moldable Computer Architecture to Accelerate DL Inference on FPGA

This article has 6 authors:
1. Mirko Mariotti
2. Giulio Bianchini
3. Igor Neri
4. Daniele Spiga
5. Diego Ciangottini
6. Loriano Storchi
This article has no evaluationsLatest version May 27, 2025
Energy-Efficient Multiplier Design Using Radix- 16 Booth Encoding and Clock Gating

This article has 4 authors:
1. Bhargav Ram R
2. Pranitha Koduru
3. Sai Kiran Amaripelly
4. Sai Rohith Yalldasari
This article has no evaluationsLatest version May 9, 2025
Accelerating CRYSTALS-Kyber: High-Speed NTT Design with Optimized Pipelining and Modular Reduction

This article has 3 authors:
1. Omar S. Sonbul
2. Muhammad Rashid
3. Amar Y. Jaffar
This article has no evaluationsLatest version May 23, 2025

Listed in

Abstract

Article activity feed

Related articles

Extending a Moldable Computer Architecture to Accelerate DL Inference on FPGA

Energy-Efficient Multiplier Design Using Radix- 16 Booth Encoding and Clock Gating

Accelerating CRYSTALS-Kyber: High-Speed NTT Design with Optimized Pipelining and Modular Reduction