DLPack: A DSP-Based Low-Bitwidth Packing Architecture for Efficient 2-Bit CNN Inference on FPGA-based Edge Devices

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Convolutional neural networks (CNNs) have become a fundamental component of modern deep learning, particularly in intelligent edge systems. However, deploying CNNs on such platforms presents challenges due to stringent constraints on power and computational resources. Field-programmable gate arrays (FPGAs), known for their reconfigurability and parallelism, offer a promising solution—yet are often inefficiently utilized for low-bitwidth inference. In this work, we present DLPack, a lightweight FPGA accelerator specifically designed for 2-bit CNN inference. DLPack introduces a structured packing technique that combines multiple low-precision multiply-accumulate (MAC) operations within a single DSP block, significantly enhancing processing density and resource efficiency. The architecture further incorporates a tile-wise dataflow strategy and a streamlined control mechanism to reduce latency and power consumption. Implemented on a Xilinx UltraScale+ FPGA, DLPack achieves up to 50% reduction in DSP usage, 83% lower power consumption, and around 99% improvement in inference latency compared to existing approaches. These results demonstrate the effectiveness of DLPack in enabling scalable, energy-efficient CNN inference on edge devices with limited computational budgets.

Article activity feed