AIR: Activation based Isotropic Regularisation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In the deep learning world, regularisation plays a significant role in preventing the overfitting and improving the model generalisation by taking control over model's complexity. Traditional approaches, such as weight decay (L2 regularization), primarily constrain the magnitude of model parameters a.k.a model weights, whereas recent methods like gradient variance regularization (GVR) focus on stabilizing the optimization process and generally take gradients into account for that. In this paper, we introduce Activation based Isotropic Regularization (AIR), a novel regularization approach that explicitly minimizes the variance of activations using the concept of subspaces corresponding to different training samples and this approach complements weight-based and gradient-based methods by promoting more stable feature representations, which in turn enhance generalization. Furthermore, we propose a hybrid variant called AIR+L2, that combines the methodology of AIR with traditional weight decay method called L2 based regularisation. This combination leverages the strengths of both methods: AIR reduces feature-level fluctuations, while L2 prevents over-parameterization. Extensive experiments on benchmark datasets using both MLP and CNN architectures demonstrate that AIR consistently improves model robustness and convergence, and AIR+L2 achieves superior performance compared to either method in isolation.

Article activity feed