Dynamic and Mixed-Precision Techniques for Scalable Iterative Generative Modeling
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid proliferation of large-scale diffusion models has catalyzed significant advancements in generative artificial intelligence, enabling high-fidelity synthesis across images, video, audio, and multimodal domains. Despite their impressive capabilities, these models impose substantial computational and memory demands, which pose critical challenges for deployment, scalability, and energy efficiency. Quantization and low-precision techniques have emerged as essential strategies for addressing these constraints by reducing numerical precision in model parameters, activations, and intermediate computations. However, unlike conventional feedforward or discriminative networks, diffusion models exhibit unique sensitivity to quantization due to their iterative denoising process, hierarchical architecture, and reliance on high-dimensional latent representations. Minor perturbations in early timesteps or error-prone layers can accumulate across iterations, leading to substantial degradation in generative quality, perceptual fidelity, and semantic consistency. This survey provides a comprehensive examination of the state-of-the-art in quantization for diffusion models, encompassing the mathematical foundations of error propagation, probabilistic modeling of quantization effects, and theoretical frameworks for precision allocation. We systematically categorize quantization strategies, including post-training quantization, quantization-aware training, mixed-precision approaches, timestep-adaptive schemes, and hybrid methodologies, highlighting their respective advantages, limitations, and hardware implications. Architectural considerations are explored in depth, focusing on layer-wise and module-specific sensitivities, attention mechanisms, residual connections, normalization layers, and hierarchical feature scales, all of which influence the optimal distribution of precision. Evaluation protocols and benchmarking strategies are discussed, integrating statistical, perceptual, and hardware-aware metrics, as well as sensitivity analyses that guide informed bitwidth assignment and adaptive precision techniques. We also address open challenges such as error accumulation, multimodal interactions, hardware co-design, integration with complementary compression techniques, and the development of robust, scalable, and task-specific quantization frameworks. Finally, we outline emerging research directions, including dynamic and input-adaptive quantization, architecture-aware methods, theoretical analysis of cumulative quantization error, and real-time deployment considerations for foundation-scale models. By synthesizing insights from algorithmic design, numerical analysis, hardware optimization, and evaluation methodologies, this survey provides a unified perspective on the current landscape and future potential of low-precision diffusion models, offering a roadmap for efficient, high-fidelity, and widely deployable generative AI systems.