Why world models fail under intervention: Ontological–causal separation as a necessary structure
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Learning world models that can understand, predict, and act within complex physical environments is a fundamental goal of embodied intelligence. However, many dominant approaches—such as latent video models, Dreamer-style imagination agents, and recent JEPA-based predictive architectures—operate on unstructured latent representations, where objects, relations, causality, and intention are implicitly entangled. While effective under observational evaluation, such representations provide limited support for intervention, planning, and counterfactual reasoning. This contrasts sharply with human cognition, which relies on structured reasoning over entities, attributes, and relations to understand and act in the world. In this work, we introduce LY-GWM (LingYang Graph World Model), a philosophy-inspired, graph-structured world model designed to examine a structural hypothesis: that explicit separation between ontological structure—what entities and relations constitute the world—and causal structure—how these entities evolve under actions—is a necessary condition for coherent world modeling. LY-GWM represents the environment as a dynamic graph composed of entities, attributes, and relations, and models causal dynamics as structured transformations over this graph. Building on this representation, LY-GWM incorporates three reasoning mechanisms inspired by long-standing philosophical traditions: causal dynamics for action-conditioned state transitions, teleological reasoning for goal-directed behavior, and dialectical novelty detection for identifying structural contradictions and unexplained changes. Through controlled diagnostic environments and high-fidelity humanoid simulation, we show that world models lacking explicit ontological–causal separation can perform well under observation yet systematically fail under intervention, whereas structured graph-based models maintain coherent reasoning across prediction, planning, and counterfactual analysis. Our results suggest that incorporating explicit ontological and causal structure is not merely a conceptual design choice, but a necessary condition for world models that support intervention-consistent reasoning in embodied intelligence.