A Lightweight Perceptual-Guided VQVAE for High-Fidelity Image Compression
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study addresses the quality-efficiency trade-off in generative models for low-bit-rate image compression by proposing HiRes-VQ, a lightweight perceptual-guided VQ-VAE framework. The framework inherits the efficient hierarchical quantization architecture of VQ-VAE-2 and innovatively introduces a super-resolution codec. By employing parallel pathways for low-frequency structure reconstruction and high-frequency detail restoration, it achieves decoupling of frequency-domain features. Additionally, a multi-scale perceptual alignment loss is adopted to guide the model in learning feature representations aligned with human visual perception. Experiments on the FFHQ-256 and ImageNet-256 datasets demonstrate that, our model significantly outperforms lightweight baseline methods across all metrics, and surpasses high-complexity models in terms of quality-efficiency balance.