Residual Connection Learning by Contextual Modulation Training in Modern Deep Neural Networks

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Residual connections are a cornerstone of modern deep neural networks, facilitating stable gradient propagation and maintaining representational expressiveness. Conventional residual formulations typically combine identity mappings and functional transformations with equal weight, without considering the input-dependent importance of each component. This uniformity restricts the model’s capability to adaptively regulate information flow. We propose a novel Contextual Modulation Training (CoMT) framework that introduces lightweight, input-dependent modulation mechanisms to dynamically modulate the functional branches of residual connections. By modulating each transformation based on the incoming data, CoMT enables fine-grained, context-aware control over information flow in the deep learning architecture. This learned modulator provides finer control than prior fixed or hand-designed scaling techniques, improving representational flexibility with a negligible cost on training scalability. CoMT is broadly applicable to architectures that employ residual connections, including ResNets and Transformers. Notably, the modulators are implemented as compact parametric functions, incurring less than 1% of the additional parameters while constantly improving the training performance. Empirical evaluations demonstrate that CoMT achieves 8%–11% perplexity reductions over baseline models across four scales of LLaMA language models, and yields substantial accuracy gains on three scales of ResNet models for image classification tasks. Along with performance improvements, we provide clear evidence that the learned modulators effectively manipulate layer-wise scaling. These findings demonstrate the effectiveness of CoMT as a general mechanism for context-sensitive residual connection modulation.

Article activity feed