LoRPIA: Low-power Reconfigurable Pallet-Integrated Accelerator for Depthwise Separable Convolutions

Sajad Eydivandi
Hakem Beitollahi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Convolutional Neural Networks (CNNs) have achieved remarkable success in tasks such as image classification and recognition, but their high computational and memory demands limit deployment on embedded devices. Depthwise Separable Convolutions (DSCs) address this challenge by reducing the number of parameters and operations while maintaining accuracy, making them an attractive choice for resource-constrained environments. Field-Programmable Gate Arrays (FPGAs) provide an energy-efficient alternative to traditional processors for accelerating CNNs, However, many state-of-the-art designs still suffer from inefficient resource usage and higher power consumption due to the way their hardware is implemented and optimized. In this work, we present a low-power and resource-efficient accelerator for depthwise separable convolutions, implemented entirely in SystemVerilog. The design effectively minimizes hardware resource usage, leading to a notable reduction in power consumption. Despite its compact footprint, the proposed accelerator maintains solid performance, achieving 8.54 FPS for MobileNetV1 and 13.05 FPS for MobileNetV2 on a Zynq XC7Z020 SoC while consuming only 1.25 watts of power.

Version published to 10.21203/rs.3.rs-8118065/v1 on Research Square
Jan 8, 2026

Flexible MAC Design for Sparse-Aware Deep Learning Accelerator

This article has 3 authors:
1. Chun-Lung Hsu
2. You-Chuan Li
3. Chih-Wei Liu
This article has no evaluationsLatest version Feb 23, 2026
Optimized Design of Lightweight NPU Accelerator for the Internet of Things Based on Mixed-precision convolution and Systolic Array

This article has 1 author:
1. HaoMiao Zhao
This article has no evaluationsLatest version Feb 18, 2026
Transformer Algorithmics: A Tutorial on Efficient Implementation of Transformers on Hardware

This article has 1 author:
1. Christoforos Kachris
This article has no evaluationsLatest version Feb 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Flexible MAC Design for Sparse-Aware Deep Learning Accelerator

Optimized Design of Lightweight NPU Accelerator for the Internet of Things Based on Mixed-precision convolution and Systolic Array

Transformer Algorithmics: A Tutorial on Efficient Implementation of Transformers on Hardware