Deep Generative Models for 3D Content Creation: A Comprehensive Survey of Architectures, Challenges, and Emerging Trends

Kaiqi Chen
Libby Ramsey

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The field of 3D model generation has become essential across various industries, including gaming, virtual and augmented reality (VR/AR), architecture, and medical imaging. Traditionally reliant on manual efforts, 3D content creation is now being transformed by deep generative models, enabling more efficient, scalable, and dynamic generation of complex shapes and environments. This survey provides a comprehensive review of key backbone architectures used for 3D generation, including autoencoders, variational autoencoders (VAEs), generative adversarial networks (GANs), autoregressive models, diffusion models, normalizing flows, attention-based models, CLIP-guided models, and procedural generation techniques. We explore each model’s role in 3D generation, highlighting their strengths—such as the precision of VAEs, the realism of GANs, the stability of diffusion models, and the scalability of procedural methods—alongside their limitations, such as training instability, high computational costs, and the difficulty in handling multi-modal data. Additionally, we discuss the increasing relevance of attention-enhanced models and the integration of text-based CLIP supervision for improved semantic alignment in 3D outputs. The survey concludes with an analysis of open challenges, including balancing efficiency with expressiveness, managing training complexity, and addressing dataset limitations. It also identifies future research directions, such as few-shot learning, hybrid architectures, and neural-symbolic approaches, which promise to advance the field by improving the generalization and versatility of 3D generation models. This paper aims to guide researchers and practitioners in navigating the evolving landscape of 3D generative methods and inspire new innovations in the creation of realistic, high-quality 3D content.

Version published to 10.20944/preprints202410.2397.v1
Oct 30, 2024

Large Language and Generative Foundation Models for Cloud-Enabled Medical Imaging and Internet of Medical Things: A PRISMA Systematic Review of Architectures, Security, and Deployment

This article has 3 authors:
1. Behnam Kiani Kalejahi
2. Kamila Khalimova
3. Mohammad Javad Rajabi
This article has no evaluationsLatest version Mar 6, 2026
3D-to-4D Gaussian Scene Generation with Text-guided Diffusion

This article has 1 author:
1. Neil De La Fuente
This article has no evaluationsLatest version Mar 2, 2026
Benchmarking Conditional GANs in Industrial Marble Texture Synthesis via a Dual-Evaluation Framework

This article has 5 authors:
1. António Alves de Campos
2. Margarida Figueiredo
3. Carlos M. A. Diogo
4. Gustavo Paneiro
5. Pedro Amaral
This article has no evaluationsLatest version Feb 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Large Language and Generative Foundation Models for Cloud-Enabled Medical Imaging and Internet of Medical Things: A PRISMA Systematic Review of Architectures, Security, and Deployment

3D-to-4D Gaussian Scene Generation with Text-guided Diffusion

Benchmarking Conditional GANs in Industrial Marble Texture Synthesis via a Dual-Evaluation Framework