FPGA-Accelerated Real-Time DCGANs via Xilinx DPUs and Vitis AI

Amirhossein Sadr
Shayan Haghighat
Aida Pakniyat
Dara Rahmati
Saeid Gorgin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Generative Adversarial Networks (GANs) produce high-quality images but are computationally intensive, especially due to transposed convolution operations, limiting their real-time performance on traditional hardware. To address this, we propose an optimized FPGA-based acceleration framework leveraging Xil-inx Deep Learning Processing Units (DPUs) and the Vitis AI toolchain to enable real-time inference of Deep Convolutional GANs (DCGANs) for image reconstruction. The proposed approach applies a two-stage quantization method that profiles layer-wise dynamic ranges and fine-tunes scale factors via host-side retraining. This enables quantization of both generator and discriminator from 32-bit floating-point to INT8 precision with minimal accuracy degradation. Additionally, structured pruning through the Vitis AI Optimizer removes redundant weights and filters, producing a compact model that fits entirely in on-chip memory and maximizes DPU efficiency. The architecture uses a multi-threaded ARM processor to manage preprocessing and DMA operations, while a lightweight scheduler in programmable logic sequences the execution of convolu-tion kernels across multiple DPU cores. Double buffering is employed to overlap data movement with computation. Experimental results on a Zynq UltraScale+ MPSoC ZCU104 show over 105 FPS throughput, achieving up to 3.5× better performance and 7.3× energy efficiency than GPU/CPU baselines, with Fréchet Inception Distance (FID) scores within 5% of floating-point models.

Version published to 10.21203/rs.3.rs-7263274/v1 on Research Square
Aug 20, 2025

Adaptive Dataflow and Precision Optimization for Deep Learning on Configurable Hardware Architectures

This article has 3 authors:
1. Gulnaz Rati
2. Rafael Mendes
3. Aisha Noor
This article has no evaluationsLatest version Oct 8, 2025
Revisiting Convolutional Design for Efficient CNNs: An Empirical Study on Embedded AI Platforms

This article has 1 author:
1. Onur Erdem Korkmaz
This article has no evaluationsLatest version Aug 25, 2025
DLPack: A DSP-Based Low-Bitwidth Packing Architecture for Efficient 2-Bit CNN Inference on FPGA-based Edge Devices

This article has 3 authors:
1. Maryam Mohabbati
2. Hakem Beitollahi
3. Somayeh Kashi
This article has no evaluationsLatest version Aug 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Adaptive Dataflow and Precision Optimization for Deep Learning on Configurable Hardware Architectures

Revisiting Convolutional Design for Efficient CNNs: An Empirical Study on Embedded AI Platforms

DLPack: A DSP-Based Low-Bitwidth Packing Architecture for Efficient 2-Bit CNN Inference on FPGA-based Edge Devices