IterVocoder: Fast High-Fidelity Speech Synthesis via GAN-Guided Iterative Refinement

Liam Bennett
Emily Marwood
Avery Thompson

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent progress in neural vocoders has demonstrated impressive advances in natural speech synthesis. Among them, denoising diffusion probabilistic models (DDPMs) and generative adversarial networks (GANs) stand out due to their ability to produce high-fidelity audio. However, DDPMs typically require a large number of iterative steps, and GANs often suffer from training instability. To reconcile these limitations, we propose IterVocoder, a novel non-autoregressive neural vocoder that unifies fixed-point iteration and adversarial learning. By applying a deep denoising network iteratively and enforcing consistency through adversarial objectives at each refinement stage, IterVocoder achieves high-quality waveform synthesis in just a few iterations. Experimental results show that IterVocoder can synthesize speech with perceptual quality on par with human speech while being over 200× faster than autoregressive models. This makes IterVocoder a practical solution for real-time neural vocoding applications.

Version published to 10.20944/preprints202506.2168.v1
Jun 26, 2025

Fourier-Enhanced TecoGAN: Advancing Video Super-Resolution with Spectral and Gradient Losses

This article has 2 authors:
1. Md. Asif Hasan
2. Radee Jamil Khan
This article has no evaluationsLatest version Jan 9, 2026
MDL-AE: Investigating the Trade-Off Between Compressive Fidelity and Discriminative Utility in Self-Supervised Learning

This article has 2 authors:
1. Zaryab Rahman
2. Mattia Ottoborgo
This article has no evaluationsLatest version Jan 12, 2026
An Integrated Framework of Frequency-Domain Denoising with Learnable Parameters in Variational Autoencoders

This article has 3 authors:
1. Xiaochen Li
2. Hongtian Zhao
3. Peng Li
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Fourier-Enhanced TecoGAN: Advancing Video Super-Resolution with Spectral and Gradient Losses

MDL-AE: Investigating the Trade-Off Between Compressive Fidelity and Discriminative Utility in Self-Supervised Learning

An Integrated Framework of Frequency-Domain Denoising with Learnable Parameters in Variational Autoencoders