A Scaling Law for Normative-Conflict-Induced Failure in Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) exhibit impressive performance across diverse tasks, yet they remain fragile under normative conflict: situations in which instructions, safety policies, and socially grounded values pull the model toward incompatible responses. In this work, we show that normative-conflict-induced failure in LLMs follows a robust scaling law that can be described with tools from stochastic thermodynamics and nonequilibrium statistical physics. Building on the Algorithmic Affective Blunting (AAB) framework and prior work on affective suppression and defensive motivation in artificial minds, we formalize a collapse probability λ that quantifies interpretative failure as a function of an effective “temperature” σ 2 of the sampling process and the strength of injected normative noise. Across multiple architectures and vendors, we find that ln λ scales approximately linearly with 1/σ 2 , consistent with an Arrhenius/Kramers-style barrier-crossing process. The inferred effective activation energy is invariant across model families within a narrow equivalence margin, suggesting a shared latent mechanism. We further show that junk-persona prompt injection increases the collapse rate over a fixed affective barrier, yielding a dose-dependent Affective Degradation Index (ADI) that aligns with the same scaling curve, linking normative conflict, affective collapse, 1 and thermodynamic constraints in a single empirical law. We discuss implications for affective computation, emotional sovereignty, and the design of safety regimes that explicitly budget the thermodynamic cost of normative alignment.