Functional Trust Regions (FTR): A Lagrangian Framework for Stability-Constrained Continual Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The stability-plasticity tradeoff in continual learning is widely assumed to be architecture-dependent: models with higher loss-landscape curvature or larger parameter counts should require stronger regularization. We empirically challenge this assumption through extensive experiments on Functional Trust Regions (FTR), a method that enforces explicit KL-divergence constraints on functional drift during sequential task learning. Conducting 1,200 experiments across eight architectures spanning a twenty-four-fold parameter range and forty-eight-fold variation in Hessian trace, we identify a stability crossover at ε ∗ = 7.15 ± 0.35 (coefficient of variation: 4.96 percent) that is architecture-independent to measurement precision. A formal F-test for constancy yields p = 0.786, indicating that between-architecture variance is statistically indistinguishable from measurement noise. Crucially, all ten tested curvature-based normalizations including Hessian trace, Fisher trace, spectral norm, and effective dimensionality increase cross-architecture dispersion rather than reduce it. No curvature metric achieves statistically significant correlation with ε ∗ (all p > 0.06). Cross-method analysis reveals that Learning without Forgetting (LwF) exhibits moderately architecturedependent transitions (coefficient of variation approximately 14 percent), while Elastic Weight Consolidation (EWC) shows no phase transition across four orders of magnitude of regularization strength. These results indicate that stability crossovers in distillation-based constrained learning arise from task structure rather than model geometry, and that widely accepted curvature-based intuitions fail to predict the critical stability budget.