Enhancing Quantum Diffusion Models for Complex Image Generation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Quantum generative models offer a novel approach to exploring high-dimensional Hilbert spaces but face significant challenges in scalability and expressibility, particularly when applied to multi-modal distributions. In this study, we propose a \textbf{Hybrid Quantum-Classical U-Net} architecture enhanced by \textbf{Adaptive Non-Local Observables (ANO)} and an \textbf{Ancilla-based Global Feature Extractor}. By compressing classical data into a dense quantum latent space and utilizing trainable observables, our model extracts rich non-local features that complement classical processing. Furthermore, we integrate a Hadamard Test module to capture global structural information, fusing it with dense local features. We also investigate the role of Skip Connections in preserving semantic information during the reverse diffusion process. Experimental results on the full MNIST dataset (digits 0-9) demonstrate that the proposed architecture generates structurally coherent and recognizable images for all digit classes, overcoming the mode collapse observed in prior quantum diffusion models. While hardware constraints necessitate resolution downscaling, our findings suggest that hybrid architectures with adaptive measurements provide a feasible pathway for enhancing generative capabilities in the NISQ era.