AnimeINR: Spatio-Temporal Implicit Neural Representation for Arbitrary-Scale Animation Video Super-Resolution
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Existing methods for video super-resolution (VSR) and video frame interpolation (VFI) primarily concentrate on devising a general pipeline suitable for open-domain videos. However, these approaches tend to overlook the inherent distinctions in animation data. Specifically, animation often features lines and smooth areas that lack textures, thereby complicating the estimation of inter-frame motions. Moreover, the exaggerated expressions common in animation introduce non-linear and highly altered motion characteristics, which can significantly limit the performance of existing methods when applied to the animation domain.In this paper, we present the first attempt to tackle the challenge of Implicit Neural Representation for achieving Space-Time Video Super-Resolution (STVSR) at arbitrary spatial and temporal scales within the animation domain. To this end, we propose a novel unified pipeline named AnimeINR, comprising three specialized modules: the Spatial Implicit Neural Representation, which defines a continuous feature domain for decoding arbitrary-scale 2D spatial coordinates into corresponding features; the mask-guided Motion Latent Learning module, which predicts motion flow between adjacent frames to enable accurate feature warping; and the Temporal Implicit Neural Representation module, which applies 3D sampling to extract relative spatial and temporal information and decode them into RGB values. Additionally, we have curated a real-world animation video dataset to evaluate the performance of state-of-the-art STVSR methods. Experimental results demonstrate that our AnimeINR framework achieves superior performance in animation STVSR across arbitrary scales.