Fourier-Enhanced TecoGAN: Advancing Video Super-Resolution with Spectral and Gradient Losses
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Video super-resolution (VSR) aims to reconstruct high-resolution (HR) video frames from low-resolution (LR) inputs by exploiting both spatial details and temporal dependencies across frames. The TecoGAN framework introduced adversarial training with spatio-temporal discriminators and a novel Ping-Pong loss to generate perceptually realistic and temporally coherent videos. However, TecoGAN’s generator architecture relies on transposed convolutions, which can introduce checkerboard artifacts, and has limited capacity to recover fine high-frequency details. In this paper, we propose Fourier-Enhanced TecoGAN , an improved GAN-based VSR framework that addresses these limitations through architectural and loss-function enhancements. Specifically, we adopt Residual-in-Residual Dense Blocks (RRDBs) from ESRGAN to strengthen feature extraction, and replace transposed convolution upsampling with PixelShuffle layers to reduce artifacts. We further introduce a frequency-aware loss that operates in the Fourier domain to emphasize high-frequency reconstruction, along with a gradient-magnitude loss to better preserve edges and structural details. To improve training stability, we employ a hinge-based adversarial loss and apply spectral normalization to the discriminator. Experimental results on the REDS dataset demonstrate that the proposed method consistently outperforms the original TecoGAN, achieving improved PSNR and SSIM while reducing LPIPS by approximately 20% and lowering temporal optical flow error (tOF). Qualitative results further show sharper textures and improved temporal stability compared to prior state-of-the-art VSR methods. Code and pretrained models will be released upon publication.