DLPack: A DSP-Based Low-Bitwidth Packing Architecture for Efficient 2-Bit CNN Inference on FPGA-based Edge Devices
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Convolutional neural networks (CNNs) have become a fundamental component of modern deep learning, particularly in intelligent edge systems. However, deploying CNNs on such platforms presents challenges due to stringent constraints on power and computational resources. Field-programmable gate arrays (FPGAs), known for their reconfigurability and parallelism, offer a promising solution—yet are often inefficiently utilized for low-bitwidth inference. In this work, we present DLPack, a lightweight FPGA accelerator specifically designed for 2-bit CNN inference. DLPack introduces a structured packing technique that combines multiple low-precision multiply-accumulate (MAC) operations within a single DSP block, significantly enhancing processing density and resource efficiency. The architecture further incorporates a tile-wise dataflow strategy and a streamlined control mechanism to reduce latency and power consumption. Implemented on a Xilinx UltraScale+ FPGA, DLPack achieves up to 50% reduction in DSP usage, 83% lower power consumption, and around 99% improvement in inference latency compared to existing approaches. These results demonstrate the effectiveness of DLPack in enabling scalable, energy-efficient CNN inference on edge devices with limited computational budgets.