Overcoming the Semantic Bottleneck for Deterministic Structural Control in Text-to-Image Synthesis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Latent diffusion models are limited by text encoders, restrict geometric control, and prevent deterministic structural steering.We propose Procedural Latent Prompt Injection as a Zero-shot steerability framework for generative modelling, modelled as a steerable Stochastic differential equation (SDE). This avoids the need for either auxiliary training or linguistics-based conditioning. A key component of the approach is the use of normalized tensor operators to embed geometric priors directly into the latent space. Empirical analysis of the diffusion trajectory indicates a critical plasticity window (timesteps 10–20) where the noise patterns embedded during diffusion are most amenable to structural steering. Compared with baselines, improvements of 19.6% in structural alignment (CLIP) and 12.3% in diversity control are achieved. These results represent an improvement over prior learned control approaches (prompt-to-prompt & controlnet) and provide a new deterministic paradigm for generative modelling control with applications in procedural synthesis of medical images, structural biology models, and physics simulations.

Article activity feed