Optimized Design of Lightweight NPU Accelerator for the Internet of Things Based on Mixed-precision convolution and Systolic Array
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The IoT lightweight neural processing unit accelerator is crucial to improving the computing efficiency of the 3D reconstruction task of neural radiance fields (NeRF) in IoT terminals. In response to the problems of high computational complexity and large storage overhead of the NeRF model, this study designs a lightweight neural processing unit accelerator based on mixed-precision convolution, using half-precision floating-point to store the convolution kernel weights. Moreover, this research is combined with a three-level hierarchical storage architecture and output fixed data flow. Moreover, it further introduces a 16×16 pulsating array, built a "mixed precision convolution+pulsating array" collaborative architecture, and designs a weight-stationary pulsating data stream. The results revealed that the accelerator could complete 2304 multiply-accumulate operations in a single clock cycle. The average NeRF single-scenario inference latency was as low as 0.92s, the average frame rate reached 1.18FPS, and the effective operation ratio was 94.15%. The average model storage capacity was 10.44MB, the maximum off-chip memory access bandwidth was only 0.96GB/s, the NeRF reconstruction accuracy was high, and the structural similarity index measure reached 0.98. In summary, the accelerator achieves a comprehensive balance of high performance, lightweight, low power consumption and high precision. It can provide key hardware support for the efficient deployment of NeRF 3D reconstruction tasks for IoT terminals.