ArtScale: Autoregressive Super-Resolution for Art Paintings via Multi-Scale Vision-Language Guidance
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Heritage digitization can require extreme super-resolution for inspecting brushwork, craquelure, and pigment aging beyond native capture limits. Most SR models are trained for fixed scale factors and degrade when extrapolated, while training directly for extreme scales is expensive. We present ArtScale, a scale-space autoregressive framework that reaches large magnifications by chaining intermediate steps while reusing a frozen SR backbone. To limit semantic drift at high magnification, ArtScale adds multi-scale vision--language guidance: a VLM generates art-aware prompts conditioned on the current and previous scale states. We fine-tune the prompt extractor with GRPO-based preference alignment to reduce repetitive or generic prompts. Experiments improve 4× restoration and show more stable behavior under recursive zooming.