SymTensor: Symbolic and Adaptive Tensor Partitioning by Unified Parallelism for Deep Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid expansion of deep learning models in scale and structural diversity has made distributed training essential. Designing efficient parallelization strategies requires balancing computation, communication, and memory. However, existing methods struggle to coordinate multiple parallelization strategies across different model components and to adapt to changing models. This paper proposes SymTensor, a strategy generation method based on a principled tensor-level cost model without relying on predefined rules.SymTensor unifies different forms of parallelism into a single system and formulates a symbolic model to jointly analyze computation, communication, and memory costs.It employs an adaptive tensor partitioning algorithm to minimize total cost.Our proposal adapts to changes such as model architectures, operator types, and input shapes. Our experiments on representative foundation models validated that SymTensor-generated strategies achieve up to more than 2x of the training performance compared to those generated by the state-of-the-art Megatron-LM.Our tensor-based symbolic-cost-driven solution provides strong efficiency, adaptability, and practicality over large-scale distributed training.