CAF-VTON: Cross-Attention Layered Fusion Based Latent Diffusion Virtual Try-On
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Virtual try-on technology has gained significant attention in e-commerce, digital retail, and virtual reality due to its ability to enhance user experience and reduce return rates. However, generating accurate and natural virtual try-on results remains challenging, especially when dealing with complex human poses and clothing deformations. In this paper, we propose CAF-VTON, a novel latent diffusion-based virtual try-on network that introduces cross-attention layered fusion for the first time in virtual try-on tasks. By leveraging a layered cross-attention mechanism, CAF-VTON can progressively extract both local and global features of human poses and clothing, enabling the capture of fine details essential for realistic virtual try-on. Here we show that CAF-VTON outperforms existing methods on high-resolution datasets, achieving state-of-the-art performance in terms of realism, detail fidelity, and pose consistency. Our work paves the way for more advanced virtual try-on solutions, offering broader applications in the fashion and retail industries.