CAF-VTON: Cross-Attention Layered Fusion Based Latent Diffusion Virtual Try-On

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Virtual try-on technology has gained significant attention in e-commerce, digital retail, and virtual reality due to its ability to enhance user experience and reduce return rates. However, generating accurate and natural virtual try-on results remains challenging, especially when dealing with complex human poses and clothing deformations. In this paper, we propose CAF-VTON, a novel latent diffusion-based virtual try-on network that introduces cross-attention layered fusion for the first time in virtual try-on tasks. By leveraging a layered cross-attention mechanism, CAF-VTON can progressively extract both local and global features of human poses and clothing, enabling the capture of fine details essential for realistic virtual try-on. Here we show that CAF-VTON outperforms existing methods on high-resolution datasets, achieving state-of-the-art performance in terms of realism, detail fidelity, and pose consistency. Our work paves the way for more advanced virtual try-on solutions, offering broader applications in the fashion and retail industries.

Article activity feed