Hybrid Diffusion Framework for Realistic Virtual Garment Try-On

Veerababu Reddy
Pravallika Bhosale
Devi Sahasra Vellalacheruvu
Himavarshini Kotha
Venkata Chandu Ranga
Isaac Sonu Yangaladasu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Image-based virtual try-on (VTON) has emerged as a pivotal challenge in visual computing, aiming to realistically depict individuals wearing target garments while preserving structural alignment and visual consistency. Recent diffusion-based generative models have shown promise in image synthesis; however, challenges persist in maintaining garment texture fidelity, pose coherence, and stable synthesis quality. This study introduces IMAGDressing, a diffusion-driven VTON framework that integrates pretrained latent diffusion models with pose-guided and garment-conditioning strategies. The framework combines garment feature encoding, human pose estimation, and attention-based conditioning within a frozen denoising backbone to enhance garment alignment and perceptual realism without extensive task-specific retraining. Experimental evaluations on VTON benchmark datasets demonstrate competitive visual quality and consistent garment preservation, with an FID of 8.54, SSIM of 0.90, and LPIPS of 0.07 on the VITON dataset, and an FID of 9.58, SSIM of 0.89, and LPIPS of 0.07 on the VITON HD dataset. Here, we show that diffusion-based conditioning mechanisms offer a viable path for controllable virtual try-on generation, highlighting practical considerations for scalable visual computing applications.The source code, pretrained models, and implementation details are publicly accessible via the GitHub repository: \href{https://github.com/Sahasra75/IMAGDressing-VTON}{https://github.com/Sahasra75/IMAGDressing-VTON}, with a permanently archived and citable version available at Zenodo DOI: \href{https://doi.org/10.5281/zenodo.19232693}{ https://doi.org/10.5281/zenodo.19232693}.

Version published to 10.21203/rs.3.rs-9234367/v1 on Research Square
Mar 30, 2026

Sampling-Efficient Unconditional Pure-Deblurring Diffusion Models via Noise-Augmented Generation

This article has 2 authors:
1. Byung-Woo Hong
2. Simon Korman
This article has no evaluationsLatest version Apr 2, 2026
Overcoming the Semantic Bottleneck for Deterministic Structural Control in Text-to-Image Synthesis

This article has 2 authors:
1. Muhammad Bilal Khan
2. Muhammad Sabih ul Hassan
This article has no evaluationsLatest version Apr 5, 2026
Deep Learning-Based Framework for Filtering Objectionable Scenes in Cartoon Videos

This article has 8 authors:
1. Irshad Ullah
2. Sameed ur Rehman
3. Wajahat Akbar
4. Altaf Hussain
5. Raaz Waheeb Attar
6. Ruzat Ullah
7. Tariq Hussain
8. Amal Hassan Alhazmi
This article has no evaluationsLatest version Apr 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Sampling-Efficient Unconditional Pure-Deblurring Diffusion Models via Noise-Augmented Generation

Overcoming the Semantic Bottleneck for Deterministic Structural Control in Text-to-Image Synthesis

Deep Learning-Based Framework for Filtering Objectionable Scenes in Cartoon Videos