FPGA Implementation of a Deep Convolutional Neural Network Hardware Accelerator
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper proposes an efficient deep convolutional neural network (CNN) accelerator designed for FPGA platforms, with a focus on optimizing data interaction bottlenecks and computational resource utilization. A dynamic block-caching strategy is employed to dynamically allocate on-chip BRAM and FIFO resources, reducing off-chip DDR access frequency by up to 71%. Additionally, 8-bit fixed-point quantization is applied to compress the weight storage to 5.73 MB, while maintaining model accuracy with only a 3.8% drop in mean average precision (mAP). The computation module features a configurable adder tree and a pipelined design, supporting adaptive switching between 3×3 and 1×1 convolution kernels. The peak throughput reaches 289 GOPS, with MAC utilization achieving a maximum of 98.10%. Experimental results demonstrate that the optimized system achieves a threefold increase in inference speed at a 250 MHz clock frequency, with power consumption of only 4.803 W and an energy efficiency ratio (0.625 fps/W) that outperforms existing solutions by a factor of 3.38. This paper provides an efficient hardware solution for real-time object detection in resource-constrained scenarios.