LoRPIA: Low-power Reconfigurable Pallet-Integrated Accelerator for Depthwise Separable Convolutions
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Convolutional Neural Networks (CNNs) have achieved remarkable success in tasks such as image classification and recognition, but their high computational and memory demands limit deployment on embedded devices. Depthwise Separable Convolutions (DSCs) address this challenge by reducing the number of parameters and operations while maintaining accuracy, making them an attractive choice for resource-constrained environments. Field-Programmable Gate Arrays (FPGAs) provide an energy-efficient alternative to traditional processors for accelerating CNNs, However, many state-of-the-art designs still suffer from inefficient resource usage and higher power consumption due to the way their hardware is implemented and optimized. In this work, we present a low-power and resource-efficient accelerator for depthwise separable convolutions, implemented entirely in SystemVerilog. The design effectively minimizes hardware resource usage, leading to a notable reduction in power consumption. Despite its compact footprint, the proposed accelerator maintains solid performance, achieving 8.54 FPS for MobileNetV1 and 13.05 FPS for MobileNetV2 on a Zynq XC7Z020 SoC while consuming only 1.25 watts of power.