Hybrid Diffusion Framework for Realistic Virtual Garment Try-On
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Image-based virtual try-on (VTON) has emerged as a pivotal challenge in visual computing, aiming to realistically depict individuals wearing target garments while preserving structural alignment and visual consistency. Recent diffusion-based generative models have shown promise in image synthesis; however, challenges persist in maintaining garment texture fidelity, pose coherence, and stable synthesis quality. This study introduces IMAGDressing, a diffusion-driven VTON framework that integrates pretrained latent diffusion models with pose-guided and garment-conditioning strategies. The framework combines garment feature encoding, human pose estimation, and attention-based conditioning within a frozen denoising backbone to enhance garment alignment and perceptual realism without extensive task-specific retraining. Experimental evaluations on VTON benchmark datasets demonstrate competitive visual quality and consistent garment preservation, with an FID of 8.54, SSIM of 0.90, and LPIPS of 0.07 on the VITON dataset, and an FID of 9.58, SSIM of 0.89, and LPIPS of 0.07 on the VITON HD dataset. Here, we show that diffusion-based conditioning mechanisms offer a viable path for controllable virtual try-on generation, highlighting practical considerations for scalable visual computing applications.The source code, pretrained models, and implementation details are publicly accessible via the GitHub repository: \href{https://github.com/Sahasra75/IMAGDressing-VTON}{https://github.com/Sahasra75/IMAGDressing-VTON}, with a permanently archived and citable version available at Zenodo DOI: \href{https://doi.org/10.5281/zenodo.19232693}{ https://doi.org/10.5281/zenodo.19232693}.