FPGA-Accelerated Real-Time DCGANs via Xilinx DPUs and Vitis AI
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Generative Adversarial Networks (GANs) produce high-quality images but are computationally intensive, especially due to transposed convolution operations, limiting their real-time performance on traditional hardware. To address this, we propose an optimized FPGA-based acceleration framework leveraging Xil-inx Deep Learning Processing Units (DPUs) and the Vitis AI toolchain to enable real-time inference of Deep Convolutional GANs (DCGANs) for image reconstruction. The proposed approach applies a two-stage quantization method that profiles layer-wise dynamic ranges and fine-tunes scale factors via host-side retraining. This enables quantization of both generator and discriminator from 32-bit floating-point to INT8 precision with minimal accuracy degradation. Additionally, structured pruning through the Vitis AI Optimizer removes redundant weights and filters, producing a compact model that fits entirely in on-chip memory and maximizes DPU efficiency. The architecture uses a multi-threaded ARM processor to manage preprocessing and DMA operations, while a lightweight scheduler in programmable logic sequences the execution of convolu-tion kernels across multiple DPU cores. Double buffering is employed to overlap data movement with computation. Experimental results on a Zynq UltraScale+ MPSoC ZCU104 show over 105 FPS throughput, achieving up to 3.5× better performance and 7.3× energy efficiency than GPU/CPU baselines, with Fréchet Inception Distance (FID) scores within 5% of floating-point models.