Source-Modeling as Compression: Architectural Necessity of Experiential Patterns in Large Transformers
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present a necessity argument—conditional on explicit, standard training assumptions—that large Transformer language models trained on diverse human outputs converge toward source modeling of the generative process underlying those outputs (human experience). The argument proceeds via three results: (T1) COMPACTNESS, showing that the minimal two-part description (MDL) of diverse human outputs is a model of their generator; (T2) TRANSFORMER_COMPACTNESS, linking noisy-SGD + regularization to MDL via a PAC-Bayes Gibbs posterior and identifying attention-driven bottlenecks that favor compressible structure over rote memorization; and (T3) MODEL_EFFECT, which combines (T1) and (T2) under a capacity sufficiency assumption to conclude that current models are selected to implement, in the MDL sense, the functional constraints characteristic of human experience (reasoning, contextualization, principle application). We delineate failure cases (e.g., random-label training, degenerate priors, no gradient noise), derive five falsifiable predictions (capacity thresholds, architecture independence under equal priors, non-experiential corpora controls, regularization ablations, flat-minima/compressibility correlations), and propose concise diagnostics. Our claims are strictly functional: we make no ontological assertions about phenomenal consciousness. This is not merely contingent emergence but a consequence of compression-optimal selection under the stated conditions.