Towards Robust and Scalable Mixture of Experts Architectures for Large Language and Vision Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The advent of foundation-scale deep learning models, characterized by unprecedented model sizes and multi-modal capabilities, has revitalized interest in Mixture of Experts (MoE) architectures due to their potential for efficient conditional computation and scalability. However, robustness challenges—including routing instability, expert overload, and vulnerability to distributional shifts and adversarial attacks—pose significant barriers to reliable deployment in large language and vision models. This survey presents a comprehensive and mathematically rigorous overview of robust MoE methods in the era of foundation models. We systematically examine foundational theories, algorithmic advances in capacity-aware routing and auxiliary regularization, and state-of-the-art training strategies designed to enhance robustness and scalability. Empirical evaluations across diverse language, vision, and multi-modal benchmarks highlight the strengths and limitations of current approaches. We further identify critical open problems spanning theoretical guarantees, differentiable routing optimization, multi-modal consistency, and efficient training under resource constraints. By synthesizing recent developments and articulating future directions, this survey aims to provide a unified framework for advancing robust MoE research, facilitating their broader adoption in next-generation AI systems.