K–R Excitation–Regulation Learning: A Stability-Driven Framework for Robust and Generalizable Vision Transformers
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Vision Transformers (ViTs) have achieved strong performance in visual recognition tasks, yet they often exhibit unstable representation dynamics, sensitivity to perturbations, and limited generalization under distribution shifts. These limitations arise from optimization processes that prioritize predictive accuracy without explicitly controlling feature evolution and stability. To address this gap, we propose a stability-driven learning framework termed K–R Excitation–Regulation Learning , which introduces nonlinear excitation and regulation mechanisms to guide representation dynamics toward equilibrium. The proposed framework models feature evolution as a dynamical process in which excitation enhances nonlinear feature interactions while regulation constrains representation drift, enabling stable embedding formation. A stability-constrained objective inspired by equilibrium principles is integrated into standard training, promoting balanced excitation–regulation behavior during learning. Unlike conventional architecture modifications, the K–R formulation directly governs representation dynamics, improving learning consistency and robustness. Extensive experiments demonstrate that the proposed method improves representation stability and generalization across multiple evaluation conditions. Specifically, K–R reduces feature drift, exhibits more controlled responses under noise perturbations, improves performance under distribution shifts and few-shot learning settings, and shows superior scaling behavior as training progresses. Notably, these gains are achieved while maintaining competitive predictive accuracy with standard ViT baselines. These findings suggest that stability-driven learning offers a principled alternative to purely optimization-based training, enabling more robust and generalizable representation learning. The K–R framework provides a new perspective on integrating dynamical systems principles into deep learning, highlighting the importance of controlled feature evolution for reliable visual recognition.