AnimeINR: Spatio-Temporal Implicit Neural Representation for Arbitrary-Scale Animation Video Super-Resolution

Qin Jiang
Qinglin Wang
Lihua Chi
Zhengqiu Deng
Xinhai Chen
Binbing Tang
Shaohe Lv
Jinsheng Deng
Richard Zhou
Jie Liu

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Existing methods for video super-resolution (VSR) and video frame interpolation (VFI) primarily concentrate on devising a general pipeline suitable for open-domain videos. However, these approaches tend to overlook the inherent distinctions in animation data. Specifically, animation often features lines and smooth areas that lack textures, thereby complicating the estimation of inter-frame motions. Moreover, the exaggerated expressions common in animation introduce non-linear and highly altered motion characteristics, which can significantly limit the performance of existing methods when applied to the animation domain.In this paper, we present the first attempt to tackle the challenge of Implicit Neural Representation for achieving Space-Time Video Super-Resolution (STVSR) at arbitrary spatial and temporal scales within the animation domain. To this end, we propose a novel unified pipeline named AnimeINR, comprising three specialized modules: the Spatial Implicit Neural Representation, which defines a continuous feature domain for decoding arbitrary-scale 2D spatial coordinates into corresponding features; the mask-guided Motion Latent Learning module, which predicts motion flow between adjacent frames to enable accurate feature warping; and the Temporal Implicit Neural Representation module, which applies 3D sampling to extract relative spatial and temporal information and decode them into RGB values. Additionally, we have curated a real-world animation video dataset to evaluate the performance of state-of-the-art STVSR methods. Experimental results demonstrate that our AnimeINR framework achieves superior performance in animation STVSR across arbitrary scales.

Version published to 10.21203/rs.3.rs-4603382/v1 on Research Square
Jul 18, 2024

Integrating Adaptive Spatio-Temporal and Motion Features in a Unified 2D Networks for Video Action Recognition

This article has 4 authors:
1. Liansong Zong
2. Linxi Li
3. Mingwei Tang
4. Li Wang
This article has no evaluationsLatest version Apr 18, 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos

This article has 8 authors:
1. Zongxia Li
2. Xiyang Wu
3. Yubin Qin
4. Guangyao Shi
5. Hongyang Du
6. Dinesh Manocha
7. Tianyi Zhou
8. Jordan Lee Boyd-Graber
This article has no evaluationsLatest version May 10, 2025
PUNet: A Lightweight Parallel U-Net Architecture Integrating Mamba-CNN for High-Precision Image Segmentation

This article has 5 authors:
1. Zhaoyan Xie
2. Xiaowei Li
3. Hongyao Ma
4. Sihao Wu
5. Dayou Cui
This article has no evaluationsLatest version May 29, 2025

Listed in

Abstract

Article activity feed

Related articles

Integrating Adaptive Spatio-Temporal and Motion Features in a Unified 2D Networks for Video Action Recognition

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos

PUNet: A Lightweight Parallel U-Net Architecture Integrating Mamba-CNN for High-Precision Image Segmentation