Qwen-Edit+: Scaling Image Editing with VLM-Guided Consistency and Aesthetic Preference Distillation

Fan Tang
Siyuan Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Instruction-based image editing has advanced substantially with the emergence of Diffusion Transformers (DiTs). However, a central challenge remains unresolved: how to accurately execute complex editing instructions while preserving the structural consistency and visual quality of the source image. Existing methods are primarily limited by three factors: noisy and imbalanced training data, insufficient structural supervision, and inadequate alignment with human aesthetic preferences. To address these issues, we propose Qwen-Edit+, a unified framework for image editing. Specifically, we first introduce Semantic-Consistency Aware Filtering (SCAF) and Distribution-Adaptive Sampling (DAS) to construct high-quality and category-balanced training data. We then propose a VLM-aware Consistency Loss (VCL), which exploits the hierarchical hidden states of Qwen2.5-VL to provide deep semantic and structural supervision. Finally, we incorporate Aesthetic Preference Distillation (APD) to further improve visual harmony and perceptual quality. In comparative experiments, our method achieved a CLIP Score of 0.347, an LPIPS of 0.219, a PSNR of 25.63, and an Aesthetic Score of 6.31 on Qwen-Consistent-Edit-1.2K, outperforming representative baselines in editability, structural fidelity, and visual quality.

Version published to 10.21203/rs.3.rs-9352857/v1 on Research Square
Apr 9, 2026

Overcoming the Semantic Bottleneck for Deterministic Structural Control in Text-to-Image Synthesis

This article has 2 authors:
1. Muhammad Bilal Khan
2. Muhammad Sabih ul Hassan
This article has no evaluationsLatest version Apr 5, 2026
Language-Driven Image Restoration and Semantic-Aware Quality Assessment: A Survey

This article has 8 authors:
1. Mingyu Liu
2. Haozhan Shu
3. Yuning Cui
4. Xingcheng Zhou
5. Hu Cao
6. Wenqi Ren
7. Boxin Shi
8. Alois Knoll
This article has no evaluationsLatest version Apr 8, 2026
Language-Driven Image Restoration and Semantic-Aware Quality Assessment: A Survey

This article has 8 authors:
1. Mingyu Liu
2. Haozhan Shu
3. Yuning Cui
4. Xingcheng Zhou
5. Hu Cao
6. Wenqi Ren
7. Boxin Shi
8. Alois Knoll
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Overcoming the Semantic Bottleneck for Deterministic Structural Control in Text-to-Image Synthesis

Language-Driven Image Restoration and Semantic-Aware Quality Assessment: A Survey

Language-Driven Image Restoration and Semantic-Aware Quality Assessment: A Survey