Multi-Scale Mixture-of-Experts ControlNet for Real-World Movie Scene Image Super-Resolution

QI YANG
Zefeng Wu
Tianzhi Wang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Image super-resolution (SR) plays a critical role in enhancing visual quality and improving the performance of downstream vision tasks. However, existing SR methods are predominantly trained and evaluated on small-scale, standardized datasets, which limits their generalization and robustness in complex real-world scenarios. As a representative application domain, movie scenes exhibit high structural complexity and visual diversity, often containing special effects, filters, and other non-natural elements that pose additional challenges for SR models. With the rapid development of the film industry and computer vision, a vast amount of high-quality imagery has become available on the web, offering rich external priors that can potentially enhance SR performance. Motivated by this, we propose a novel reference-based SR framework for movie scenes, termed Multi-Scale Mixture-of-Experts ControlNet (MMoEControl). Our approach first retrieves semantically or structurally similar high-quality images from web-scale data based on features extracted from the low-resolution (LR) input, forming a reference image set. We then design a Multi-Scale Mixture-of-Experts (MMoE) framework built upon an improved ControlNet architecture, which injects the multi-scale reference information into a frozen pre-trained diffusion model to guide the generation of HR outputs. The core contributions of our method include a SR-guided reference image retrieval module and a multi-scale conditional ControlNet, which jointly integrate structural and textural cues from the references while leveraging diffusion priors to mitigate the limitations of standard training datasets. Compared to conventional “blind” SR methods that operate without external guidance, MMoEControl explicitly “copies” beneficial features from relevant reference images, significantly improving structural fidelity and detail reconstruction. Experimental results demonstrate that our approach consistently outperforms existing methods on various real-world movie scene datasets, highlighting its strong generalization ability and practical value.

Version published to 10.21203/rs.3.rs-6916630/v1 on Research Square
Jun 25, 2025

A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement

This article has 3 authors:
1. Muhammad Azeem Aslam
2. Hassan Khalid
3. Nisar Ahmed
This article has no evaluationsLatest version Jun 27, 2025
Semantic Saliency from Multi-Modal Large Language Model Scene Understanding Maps

This article has 5 authors:
1. Shravan Murlidaran
2. Ziqi Wen
3. Jonathan Skaza
4. William Wang
5. Miguel P Eckstein
This article has no evaluationsLatest version Aug 1, 2025
SGDet-Light: Synergistic Global-Local Learning for Efficient Small Object Detection

This article has 6 authors:
1. Di WU
2. ZhongZheng Liu
3. ZiHan Chen
4. YuePing Xiao
5. XiaoLin Zhu
6. Qin WAN
This article has no evaluationsLatest version Jul 28, 2025

Listed in

Abstract

Article activity feed

Related articles

A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement

Semantic Saliency from Multi-Modal Large Language Model Scene Understanding Maps

SGDet-Light: Synergistic Global-Local Learning for Efficient Small Object Detection