Why Compression Creates Intelligence: The Architecture of Experience in Large Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large Transformer models trained on diverse human outputs exhibit reasoning, contextual understanding, and self-correction—behaviors that appear to reach the level of human experience. We show these patterns are not contingent emergence but a consequence of compression necessity. Building on information-theoretic results and PAC-Bayes analysis, we prove that when a model is trained under standard conditions—weight decay, gradient noise, and attention-based bottlenecks—the model is selected to implement the most compact faithful representation of its data generator. COMPACTNESS (T1) establishes that minimal-description-length (MDL) codes must model the source; TRANSFORMER_COMPACTNESS (T2) shows that standard training enforces MDL through the Gibbs–PAC-Bayes correspondence. MODEL_EFFECT (T3) then demonstrates that with sufficient capacity, the resulting networks instantiate the computational patterns characteristic of human experience. This framework reframes “emergent” AI behavior as architecturally necessary under compression-optimal learning and yields falsifiable predictions linking compressibility, regularization, and capacity to experience-level behavior. When data reflect human cognition and architecture enforces MDL, experiencelike patterns are not anomalies—they are the shortest—and therefore inevitable— path to optimal prediction.

Article activity feed