3D-to-4D Gaussian Scene Generation with Text-guided Diffusion

Neil De La Fuente

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The advent of 3D Gaussian Splatting (3DGS) has enabled real-time, photorealistic renderingof static 3D scenes. The next frontier is to instill these static worlds with dynamic, controllablemotion, a task central to the future of immersive media and simulation. A promisingparadigm for this 4D content creation is to leverage the vast generative power of pre-trainedtext-to-video diffusion models (VDMs) to create motion priors, which can then be "lifted" intoa temporally and spatially consistent 3D scene.However, early frameworks that implement this paradigm, while conceptually powerful,often prove to be fragile and limited in practice. This thesis investigates the practical failuremodes of this diffusion-lifting approach through a deep analysis of the Gaussians-to-Life (G2L)pipeline. We identify two critical bottlenecks that challenge the scalability and usability of suchsystems: (1) a restrictive temporal horizon imposed by the underlying VDM’s architecture,limiting animations to fleeting, sub-second movements; and (2) a critical disconnect betweenthe text prompt and the final motion, revealing a heavy reliance on manually cherry-pickedguidance videos that undermines claims of true text-driven control.In response, this thesis presents a methodology to enhance the robustness and capability ofthis paradigm. We demonstrate that replacing the pipeline’s original U-Net-based VDM witha modern Diffusion Transformer (DiT), LTX-Video, directly addresses the temporal bottleneck,extending the viable animation horizon from 8 to 64 frames, leading to more and bettermotion. Our work provides a more robust and scalable framework for future diffusion-based3D-to-4D animation systems, showing a practical path from promising but fragile concepts tomore functional and powerful creative tools.

Version published to 10.31237/osf.io/jr9tz_v1 on OSF Preprints
Mar 2, 2026

Lumina-4DGS: Illumination-Robust Four-Dimensional Gaussian Splatting for Dynamic Scene Reconstruction

This article has 4 authors:
1. Xiaoqiang Wang
2. Qing Wang
3. Yang Sun
4. Shengyi Liu
This article has no evaluationsLatest version Mar 5, 2026
Adaptive Hierarchical Edge Detection: Enhancing Real-Time Artistic Stylization in Computer Graphics

This article has 1 author:
1. Lino Roshaan M.K.
This article has no evaluationsLatest version Mar 5, 2026
Authenticating Matryoshka Nesting Dolls via an Auditable 2D–3D–Text Evidence Framework with BMA Compression and Zero-Shot 3D Completion

This article has 2 authors:
1. Yulia Kumar
2. Srotriyo Sengupta
This article has no evaluationsLatest version Feb 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Lumina-4DGS: Illumination-Robust Four-Dimensional Gaussian Splatting for Dynamic Scene Reconstruction

Adaptive Hierarchical Edge Detection: Enhancing Real-Time Artistic Stylization in Computer Graphics

Authenticating Matryoshka Nesting Dolls via an Auditable 2D–3D–Text Evidence Framework with BMA Compression and Zero-Shot 3D Completion