Superposition in Transformers: A Novel Way of Building Mixture of Experts

Ayoub Ben Chaliah
Hela Dellagi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Catastrophic forgetting remains a major challenge when adapting large language models (LLMs) to new tasks or domains. Conventional fine-tuning often overwrites existing knowledge, causing performance degradation on original tasks. We introduce _Superposition in Transformers_, a novel architecture that leverages autoencoders to superimpose the hidden representations of a base model and a fine-tuned model within a shared parameter space. By using B-spline-based blending coefficients and autoencoders that adaptively reconstruct hidden states based on the input data distribution, our method effectively mitigates catastrophic forgetting and enables a new paradigm of “in-model” superposition. This approach preserves original model capabilities while allowing compact domain-specific expertise to be added, and it supports dynamic switching between model states during inference.

Version published to 10.32388/ogj9k5
Jan 24, 2025

Residual Connection Learning by Contextual Modulation Training in Modern Deep Neural Networks

This article has 5 authors:
1. Yingtao Zhang
2. Wenqi Gu
3. Wen Hu
4. Jianguo Li
5. Carlo Vittorio Cannistraci
This article has no evaluationsLatest version Jun 3, 2025
Translution: Unifying Transformer and Convolution for Adaptive and Relative Modeling

This article has 3 authors:
1. Hehe Fan
2. Yi Yang
3. Fei Wu
This article has no evaluationsLatest version May 10, 2025
Translution: Unifying Transformer and Convolution for Adaptive and Relative Modeling

This article has 3 authors:
1. Hehe Fan
2. Yi Yang
3. Fei Wu
This article has no evaluationsLatest version May 8, 2025

Listed in

Abstract

Article activity feed

Related articles

Residual Connection Learning by Contextual Modulation Training in Modern Deep Neural Networks

Translution: Unifying Transformer and Convolution for Adaptive and Relative Modeling

Translution: Unifying Transformer and Convolution for Adaptive and Relative Modeling