Source-Modeling as Compression: Architectural Necessity of Experiential Patterns in Large Transformers

Roger Dev

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We present a necessity argument—conditional on explicit, standard training assumptions—that large Transformer language models trained on diverse human outputs converge toward source modeling of the generative process underlying those outputs (human experience). The argument proceeds via three results: (T1) COMPACTNESS, showing that the minimal two-part description (MDL) of diverse human outputs is a model of their generator; (T2) TRANSFORMER_COMPACTNESS, linking noisy-SGD + regularization to MDL via a PAC-Bayes Gibbs posterior and identifying attention-driven bottlenecks that favor compressible structure over rote memorization; and (T3) MODEL_EFFECT, which combines (T1) and (T2) under a capacity sufficiency assumption to conclude that current models are selected to implement, in the MDL sense, the functional constraints characteristic of human experience (reasoning, contextualization, principle application). We delineate failure cases (e.g., random-label training, degenerate priors, no gradient noise), derive five falsifiable predictions (capacity thresholds, architecture independence under equal priors, non-experiential corpora controls, regularization ablations, flat-minima/compressibility correlations), and propose concise diagnostics. Our claims are strictly functional: we make no ontological assertions about phenomenal consciousness. This is not merely contingent emergence but a consequence of compression-optimal selection under the stated conditions.

Version published to 10.20944/preprints202511.0593.v1
Nov 10, 2025

Why Compression Creates Intelligence: The Architecture of Experience in Large Models

This article has 1 author:
1. Roger Dev
This article has no evaluationsLatest version Nov 13, 2025
Generative AI for Research: Paradigms, Tasks, Evaluation, and Best Practices

This article has 7 authors:
1. Antu Kumar Guha
2. Prosenjit Das
3. Amit Bala
4. Sabah Ummie
5. Muhammad Enayetur Rahman
6. Md Nurul Absar Siddiky
7. Muhammad Rezaur Rahman
This article has no evaluationsLatest version Nov 6, 2025
Learning with Fewer Bits Across Layers and Time in the Training of Foundation-Scale Transformers

This article has 5 authors:
1. Oliver Hartley
2. Priya Desai
3. Nathaniel Brooks
4. Eleanor Hughes
5. Beverley Marion
This article has no evaluationsLatest version Sep 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Why Compression Creates Intelligence: The Architecture of Experience in Large Models

Generative AI for Research: Paradigms, Tasks, Evaluation, and Best Practices

Learning with Fewer Bits Across Layers and Time in the Training of Foundation-Scale Transformers