Latent Residual Drift Modulation through Temporal Contextual Pruning in Large Language Model Layer Transience

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Residual drift in autoregressive decoding introduces representational instability across hidden layers, complicating interpretability and constraining output consistency under long or fragmented input sequences. Without altering the underlying architecture or invoking any form of human supervision, the proposed technique introduces Temporal Contextual Pruning as an inference-only intervention guided entirely through internal divergence metrics. Layer-specific suppression of attention scores within temporally localized windows was applied based on cosine shift thresholds and reconstruction error, decoupled from semantic content or task performance. Experimental evaluations using multilingual, code synthesis, and logic-based prompts revealed measurable suppression of late-layer drift, while preserving sequence-level entropy and lexical diversity. Analysis of attention head disruption patterns indicated localized activation volatility, with pruning exerting stronger effects on structurally unstable heads in deeper layers. Recurrent input configurations further amplified the observed divergence in baseline conditions, highlighting the role of cumulative residual saturation in representational decay. Despite a modest increase in latency, pruning maintained generalization across task types and introduced minimal deviation in output distributions. The findings suggest that internal activation metrics may serve as functional proxies for inference-time control in LLMs where task supervision is either unavailable or undesired.

Article activity feed