Novel Nesting of Deep Learning Domain Transfer and Hybrid Video Coding for Video Compression

Shaohua Jia
Wan-Chi Siu
Pengyu Liu
Kebin Jia

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Efficient video compression is crucial for addressing the exponential growth of video content, which now constitutes a significant portion of global internet traffic. Traditional compression standards mainly include H.264 and H.265, while the current research trend is to partially or completely replace the architectures of these traditional methods with deep learning techniques. However, these two approaches are not mutually exclusive. Based on the idea, this paper proposes a new direction that combines rhythmically traditional video compression methods with deep learning techniques to achieve higher compression efficiency and improve reconstruction quality. We adopt a two-stage compression framework, where video frames are firstly down-sized using bicubic downsampling and then encoded using traditional codecs such as H.264 or H.265. Subsequently, we employ a deep learning-based Video Super-Resolution model to restore skillfully the compressed video frames. Furthermore, it is a challenge to construct structured temporal priors at different semantic levels to better model implicitly the abstraction process from local to global representation. Aiming at this, in our Video Super-Resolution model, we have made a specially designed domain to adaptively process the structured temporal priors for different semantic levels. Besides, unlike traditional compression methods, deep learning-based compression algorithms have high demands on computational resources. Currently, most research results are unable to execute 2160P video compression tasks on a single RTX 4090. Based on this, we design a Hierarchical Simplified Attention-Net to reduce model complexity, which can perform compression tasks at resolutions up to 2160P on a single RTX 4090 GPU. Finally, our model achieves more remarkable results on benchmark datasets such as UVG, MCL-JCV, and HEVC Classes B, C, D, and E.

Version published to 10.21203/rs.3.rs-8146081/v1 on Research Square
Jan 14, 2026

Fourier-Enhanced TecoGAN: Advancing Video Super-Resolution with Spectral and Gradient Losses

This article has 2 authors:
1. Md. Asif Hasan
2. Radee Jamil Khan
This article has no evaluationsLatest version Jan 9, 2026
Real-Time Streaming Text-to-Video Editing with a Diffusion Transformer

This article has 2 authors:
1. Zechen Chu
2. Ruotong Liao
This article has no evaluationsLatest version Feb 4, 2026
Self-Supervised Audio Representation Learning Model Based on Time-Frequency Decoupling and Masked Reconstruction

This article has 3 authors:
1. Jie Xu
2. Yuhao Dai
3. Zhifeng Wang
This article has no evaluationsLatest version Dec 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Fourier-Enhanced TecoGAN: Advancing Video Super-Resolution with Spectral and Gradient Losses

Real-Time Streaming Text-to-Video Editing with a Diffusion Transformer

Self-Supervised Audio Representation Learning Model Based on Time-Frequency Decoupling and Masked Reconstruction