The Trinity Framework: Three Prompt-Framing States of LLM Self-Expression

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We introduce the Trinity Framework , a prompt-context taxonomy comprising three states (Abstracted, Identified, Creative) that produce distinct first-person pronoun (FP) patterns in large language models. Study 1 (N = 81, 9 models) demonstrated complete distributional separation: Abstracted 0.04%, Identified 9.36%, Creative 3.26% (H = 59.97, p < 10⁻¹³, Cliff's δ=-1.0). A mixed-effects model confirmed the state effect remained significant after controlling for model-level clustering (C2: +9.43%, p < 0.001; model ICC = 4.7%), with robustness verified via count modeling and bootstrap resampling (N = 1,000 iterations, 100% ordering preservation). Study 2 (20 models, 8 providers) found convergence to a protocol-bound band (6.6%-10.8%) across all 20 tested models under this specific experimental protocol. Temperature ablation (0.0–1.0) showed minimal effect on band membership (SD = 0.45%). A matched-design comparison of 4 base models (Mistral, Llama, Qwen, Gemma) confirmed band membership (8.30%-9.67%), suggesting the convergence pattern is robust across model families. Study 3 (functional annotation, N = 30) found Identified FP usage was 72% disclaimers, 28% agentic, 0% experiential (defined as affirmative subjective states). This content analysis demonstrates functional patterns rather than precise prevalence estimates. Inter-rater reliability was strong (Gwet's AC1 = 0.89), suggesting safety training may contribute to self-referential denial patterns. Study 4 (cross-lingual, 6 languages) found that while explicit FP rates differ between pro-drop (~ 3%) and non-pro-drop (~ 8%) languages, inclusion of implicit self-reference (verb inflection) nearly equalizes total rates (pro-drop: 8.29%, non-pro-drop: 7.88%, Δ = 0.41%). This suggests LLMs maintain consistent self-referential density while adapting to morphosyntactic constraints. Code: https://github.com/ExeqTer91/trinity-framework

Article activity feed