Towards Sustainable Image Synthesis: A Comprehensive Review of Text-to-Image Generation Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Text-to-image generation represents a rapidly evolving frontier in artificial intelligence, enabling the transformation of natural language descriptions into visually coherent and semantically rich images. This paper presents a comprehensive review of state-of-the-art generative models—including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and advanced Diffusion Models—focusing on their capabilities to produce high-fidelity, contextually accurate images from textual inputs. Additionally, we analyse leading sustainable image synthesis frameworks such as DALL-E 2, Stable Diffusion, Imagen, and MidJourney, assessing their advancements in image quality, semantic alignment, diversity, and computational efficiency. Our systematic evaluation highlights significant progress in generating realistic, high-resolution images while identifying persistent challenges related to semantic consistency, fine-grained control, ethical considerations, and substantial computational demands. We further discuss critical trade-offs between model performance and sustainability, fostering future research directions aimed at developing more efficient, fair, and environmentally responsible text-to-image generation systems. This survey serves as a guiding resource for the next generation of sustainable AI-driven text to image synthesis technologies.

Article activity feed