Flexible MAC Design for Sparse-Aware Deep Learning Accelerator
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing deployment of deep convolutional neural networks (DCNNs) in real‑time and resource‑constrained environments has intensified the demand for hardware accelerators capable of efficiently handling sparse and irregular computation patterns. Although systolic arrays offer high throughput, their rigid dataflow structure leads to severe processing‑element (PE) underutilization when executing unstructured sparse matrix operations, resulting in fragmented computation and unnecessary memory traffic. This work presents a flexible multiply-accumulate (MAC) architecture that enables sparsity‑aware deep learning accelerators (SA‑DLAs) while supporting both floating‑point and fixed‑point arithmetic within a unified datapath. The proposed architecture dynamically adapts to operand sparsity and data distribution, improving PE utilization without introducing complex control overhead. A complete SA‑DLA engine incorporating the flexible MAC is implemented in TSMC 28‑nm CMOS technology and validated on FPGA. Experimental results demonstrate that the proposed design significantly enhances computational efficiency under irregular workloads, achieving low latency, low power consumption, and high energy efficiency compared with conventional dense systolic‑array‑based accelerators. These results highlight the effectiveness of the proposed architecture for next‑generation sparse‑aware AI hardware systems.