ViT-StyleGAN2-ADA for Limited-Data Training

Md Mahabubur Rahman
Biwei Chen
Hui Zeng

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Generative Adversarial Networks (GANs) have demonstrated impressive performance in synthesizing high-fidelity images but often suffer from discriminator overfitting when training data is limited. Adaptive Discriminator Augmentation (ADA) overcomes this limitation but fails to preserve global structure. To mitigate this gap, our study integrates a multi-scale Vision Transformer (ViT)-based discriminator within the StyleGAN2-ADA framework; to stabilize training and reduce mode collapse, we utilize global self-attention in the ViT discriminator to model both local texture and global structure. To enhance ADA’s non-leaking stochastic transformations, we incorporate two additional operations: patch dropout and patch shuffle which further diversify the discriminator’s input without compromising the learning signal. Moreover, we incorporate advanced augmentation strategies whose application probabilities are dynamically adjusted based on feedback from the discriminator, thereby ensuring adaptive and effective regularization throughout training. Furthermore, comprehensive modifications to the loss functions, including token-based Path Length Regularization and gradient penalties tailored for the ViT discriminator, are implemented to enhance training stability and convergence. Experimenting with the proposed approach on multiple datasets, we find that it outperforms, or matches, the baseline StyleGAN2-ADA, and other state-of-the-art GANs. These findings position ViT-D-StyleGAN2-ADA as a powerful solution for generative modeling in data-constrained scenarios. Code and models are available at: https://github.com/mahabub657fy3/ViT-D-StyleGAN2-ADA.

Version published to 10.20944/preprints202507.0296.v1
Jul 3, 2025

IterVocoder: Fast High-Fidelity Speech Synthesis via GAN-Guided Iterative Refinement

This article has 3 authors:
1. Liam Bennett
2. Emily Marwood
3. Avery Thompson
This article has no evaluationsLatest version Jun 26, 2025
Generative Adversarial Networks (GANs) for Medical Image Synthesis and Data Augmentation

This article has 2 authors:
1. Dave Paulson
2. Lucas Victor
This article has no evaluationsLatest version Jun 16, 2025
Deep Learning-Based Predictive Analysis of Daylight Transitions in Photographic Images

This article has 1 author:
1. Rishabh Jaiswal
This article has no evaluationsLatest version Jun 17, 2025

Listed in

Abstract

Article activity feed

Related articles

IterVocoder: Fast High-Fidelity Speech Synthesis via GAN-Guided Iterative Refinement

Generative Adversarial Networks (GANs) for Medical Image Synthesis and Data Augmentation

Deep Learning-Based Predictive Analysis of Daylight Transitions in Photographic Images