Improving Deep Learning Performance with Mixture of Experts and Sparse Activation

Jaye Nnamdi
Vasily Dimitri
Somerled Amar

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The increasing complexity and scale of modern machine learning models have led to growing computational demands, raising concerns about efficiency, scalability, and adaptability. Traditional deep learning architectures often struggle to balance computational cost with model expressiveness, particularly in tasks requiring specialization across diverse data distributions. One promising solution is the use of modular architectures that allow selective activation of parameters, enabling efficient resource allocation while maintaining high performance. Mixture of Experts (MoE) is a widely adopted modular approach that partitions the model into multiple specialized experts, dynamically selecting a subset of them for each input. This technique has demonstrated remarkable success in large-scale machine learning applications, including natural language processing, computer vision, speech recognition, and recommendation systems. By leveraging sparse activation, MoE architectures achieve significant computational savings while scaling to billions of parameters. This survey provides a comprehensive overview of MoE, covering its fundamental principles, architectural variations, training strategies, and key applications. Additionally, we discuss the major challenges associated with MoE, including training stability, expert imbalance, interpretability, and hardware constraints. Finally, we explore potential future research directions aimed at improving efficiency, fairness, and real-world deployability. As machine learning continues to advance, MoE is poised to play a crucial role in the development of scalable and adaptive AI systems.

Version published to 10.20944/preprints202503.0611.v1
Mar 10, 2025

Advances in Parameter-Efficient Fine-Tuning: Optimizing Foundation Models for Scalable AI

This article has 1 author:
1. Shufen Zhihao
This article has no evaluationsLatest version Mar 27, 2025
Breaking the Bottleneck Advances in Efficient Transformer Design

This article has 1 author:
1. Yawen Bao
This article has no evaluationsLatest version Feb 28, 2025
Low-Rank Adaptation for Scalable Fine-Tuning of Pre-Trained Language Models

This article has 2 authors:
1. Haoyu Dong
2. Jianhong Shun
This article has no evaluationsLatest version Feb 11, 2025

Listed in

Abstract

Article activity feed

Related articles

Advances in Parameter-Efficient Fine-Tuning: Optimizing Foundation Models for Scalable AI

Breaking the Bottleneck Advances in Efficient Transformer Design

Low-Rank Adaptation for Scalable Fine-Tuning of Pre-Trained Language Models