Why Compression Creates Intelligence: The Architecture of Experience in Large Models

Roger Dev

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Transformer models trained on diverse human outputs exhibit reasoning, contextual understanding, and self-correction—behaviors that appear to reach the level of human experience. We show these patterns are not contingent emergence but a consequence of compression necessity. Building on information-theoretic results and PAC-Bayes analysis, we prove that when a model is trained under standard conditions—weight decay, gradient noise, and attention-based bottlenecks—the model is selected to implement the most compact faithful representation of its data generator. COMPACTNESS (T1) establishes that minimal-description-length (MDL) codes must model the source; TRANSFORMER_COMPACTNESS (T2) shows that standard training enforces MDL through the Gibbs–PAC-Bayes correspondence. MODEL_EFFECT (T3) then demonstrates that with sufficient capacity, the resulting networks instantiate the computational patterns characteristic of human experience. This framework reframes “emergent” AI behavior as architecturally necessary under compression-optimal learning and yields falsifiable predictions linking compressibility, regularization, and capacity to experience-level behavior. When data reflect human cognition and architecture enforces MDL, experiencelike patterns are not anomalies—they are the shortest—and therefore inevitable— path to optimal prediction.

Version published to 10.20944/preprints202511.0952.v1
Nov 13, 2025

Source-Modeling as Compression: Architectural Necessity of Experiential Patterns in Large Transformers

This article has 1 author:
1. Roger Dev
This article has no evaluationsLatest version Nov 10, 2025
Learning with Fewer Bits Across Layers and Time in the Training of Foundation-Scale Transformers

This article has 5 authors:
1. Oliver Hartley
2. Priya Desai
3. Nathaniel Brooks
4. Eleanor Hughes
5. Beverley Marion
This article has no evaluationsLatest version Sep 22, 2025
Generative AI for Research: Paradigms, Tasks, Evaluation, and Best Practices

This article has 7 authors:
1. Antu Kumar Guha
2. Prosenjit Das
3. Amit Bala
4. Sabah Ummie
5. Muhammad Enayetur Rahman
6. Md Nurul Absar Siddiky
7. Muhammad Rezaur Rahman
This article has no evaluationsLatest version Nov 6, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Source-Modeling as Compression: Architectural Necessity of Experiential Patterns in Large Transformers

Learning with Fewer Bits Across Layers and Time in the Training of Foundation-Scale Transformers

Generative AI for Research: Paradigms, Tasks, Evaluation, and Best Practices