EFE-SDG: Efficient Feature Extraction of Finetuning-Free Model in Subject-Driven Generation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Text-to-image models mark a major advance in AI, and subject-driven generation techniques hold promise. The subject-driven models that are currently avail- able still necessitate a compromise between the fidelity of the subject and the extent of text-to-image generation. Furthermore, they cannot provide a compre- hensive understanding of the reference image based on a limited amount of data. To address these limitations, we propose the EFE-SDG model, which is divided into three blocks: (i) the MOCI block employs the Gpt-4o and other large mod- els for the processing of the reference image to obtain more information about the reference image. (ii) The ASCA module, based on the decoupling strategy of the cross-attention module, performs additional processing on the reference image to obtain more detailed and richer high-level features of the subject. (iii) The subject Feature Adaptive Attention Rules is used to fuse the low-level fea- tures extracted by the ref-diffusion model and the high-level features extracted by ASCA at the Attention layer of the main-diffusion model. In addition, we use the ref-diffusion model to extract low-level feature inputs from the reference features to the main-diffusion model, which circumvents the different training distributions of the other encoders and the main-diffusion model. In our compar- ative experiments, with respect to graphic alignment, our approach demonstrates performance similar to that of the other methods, but our approach uses sub- stantially smaller datasets and fewer computational resources during the training phase.Our code will be available at:https://gitee.com/yongzhenke/efe-sdg