Irrelevant Region Preserving for Counterfactual Image Manipulation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Image manipulation is one of the most significant and potential research topics in multimodality. Several existing methods based on Contrastive-Language-Image-Pretraining (CLIP) have achieved high-resolution image editing recently, but the challenging problem of complex editing and attribute disentanglement has not been solved yet. In this paper, we propose an image editing method combining the powerful capability of complex editing with the accurate protection of the irrelevant attributes, simultaneously addressing above two challenging issues. To gain a more comprehensive semantic representation, we design a simple but effective structure with the cross-attention mechanism, allowing better fusion between text and image feature. In addition, a mask-controlled method is applied to keep the semantics of irrelevant regions unchanged after editing. We conduct extensive experiments and analysis to evaluate the generative capability of our method. The results demonstrate that our design successfully achieves semantic representation and accurate editing, and outperforms the compared methods in image quality.

Article activity feed