Strategies for Deploying High-Fidelity Generative Diffusion Models at Scale under Computational and Energy Constraints

Maria Jensen
Lars Holm
Søren Kristensen
Chand Aline

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Generative diffusion models have emerged as a powerful class of probabilistic models capable of synthesizing high-fidelity data across diverse domains, including images, audio, video, and multimodal content. Their iterative denoising processes, grounded in stochastic differential equations and Markovian transitions, allow them to learn complex data distributions with remarkable accuracy. However, the deployment of these models in practical, large-scale applications is severely constrained by their computational and memory requirements, particularly in the context of big data environments where datasets are massive, heterogeneous, and continuously evolving. Quantization, the process of reducing the numerical precision of model parameters and activations, has recently gained attention as a crucial strategy to mitigate these challenges, offering substantial reductions in memory footprint, computational overhead, and energy consumption while maintaining the generative fidelity of the models. This survey provides a comprehensive analysis of quantization strategies for generative diffusion models, spanning post-training quantization, quantization-aware training, mixed-precision schemes, dynamic and adaptive bitwidth methods, and hybrid approaches that integrate complementary compression techniques such as pruning, low-rank factorization, and weight clustering. We systematically explore the mathematical foundations of quantization in the context of iterative denoising, formalizing error propagation, step-dependent sensitivity, and stochastic effects induced by low-precision arithmetic. Furthermore, we examine the system-level and hardware-aware implications of quantization, including memory alignment, tensor-core acceleration, cache utilization, distributed computation, and energy efficiency, highlighting the trade-offs that arise in heterogeneous big data pipelines. The survey also emphasizes the challenges unique to generative diffusion models, such as the amplification of quantization noise across timesteps, sensitivity to out-of-distribution and heterogeneous datasets, robustness to adversarial or rare events, and the complex interactions between precision reduction and model architecture. We review evaluation metrics and benchmarking strategies for quantized diffusion models, discussing traditional measures such as Fréchet Inception Distance and Inception Score alongside perceptual fidelity metrics, diversity assessments, robustness analyses, and hardware-aware efficiency measures. Finally, we outline open research directions and emerging trends, including adaptive and data-dependent quantization policies, integration with complementary compression methods, cross-platform optimization, fairness and robustness assurance, and energy-efficient design. By synthesizing advances across algorithmic, mathematical, hardware, and system-level perspectives, this survey provides a holistic framework for understanding, evaluating, and deploying quantized generative diffusion models in big data contexts, offering guidance for both researchers and practitioners seeking to balance computational efficiency with high-fidelity generative performance.

Version published to 10.31224/5181
Aug 25, 2025

Dynamic and Mixed-Precision Techniques for Scalable Iterative Generative Modeling

This article has 5 authors:
1. Mikkel Jensen
2. Katrine Sørensen
3. Lars Pedersen
4. Freja Nielsen
5. Chand Aline
This article has no evaluationsLatest version Aug 26, 2025
Traditional and Machine Learning Approaches to Partial Differential Equations: A Critical Review of Methods, Trade-Offs, and Integration

This article has 1 author:
1. Mohammad Nooraiepour
This article has no evaluationsLatest version Sep 4, 2025
Hyperparameters Are All You Need: Using Five-Step Inference for an Original Diffusion Model to Generate Images Comparable to the Latest Distillation Model

This article has 1 author:
1. Zilai Li
This article has no evaluationsLatest version Sep 24, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Dynamic and Mixed-Precision Techniques for Scalable Iterative Generative Modeling

Traditional and Machine Learning Approaches to Partial Differential Equations: A Critical Review of Methods, Trade-Offs, and Integration

Hyperparameters Are All You Need: Using Five-Step Inference for an Original Diffusion Model to Generate Images Comparable to the Latest Distillation Model