Breaking the Bottleneck Advances in Efficient Transformer Design

Yawen Bao

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Transformers have become the backbone of numerous advancements in deep learning, excelling across domains such as natural language processing, computer vision, and scientific modeling. Despite their remarkable performance, the high computational and memory costs of the standard Transformer architecture pose significant challenges, particularly for long sequences and resource-constrained environments. In response, a wealth of research has been dedicated to improving the efficiency of Transformers, resulting in a diverse array of innovative techniques. This survey provides a comprehensive overview of these efficiency-driven advancements. We categorize existing approaches into four major areas: (1) approximating or sparsifying the self-attention mechanism, (2) reducing input or intermediate representation dimensions, (3) leveraging hierarchical and multiscale architectures, and (4) optimizing hardware utilization through parallelism and quantization. For each category, we discuss the underlying principles, representative methods, and the trade-offs involved. We also identify key challenges in the field, including balancing efficiency with performance, scaling to extremely long sequences, addressing hardware constraints, and mitigating the environmental impact of large-scale models. To guide future research, we highlight promising directions such as unified frameworks, dynamic and sparse architectures, energy-aware designs, and cross-domain adaptations. By synthesizing the latest advancements and providing insights into unresolved challenges, this survey aims to serve as a valuable resource for researchers and practitioners seeking to develop or apply efficient Transformer models. Ultimately, the pursuit of efficiency is crucial for ensuring that the transformative potential of Transformers can be realized in a sustainable, accessible, and impactful manner.

Version published to 10.20944/preprints202502.2271.v1
Feb 28, 2025

Improving Deep Learning Performance with Mixture of Experts and Sparse Activation

This article has 3 authors:
1. Jaye Nnamdi
2. Vasily Dimitri
3. Somerled Amar
This article has no evaluationsLatest version Mar 10, 2025
Fine-Tuning Transformers Efficiently: A Survey on LoRA and Its Impact

This article has 2 authors:
1. Muchen Huan
2. Jianhong Shun
This article has no evaluationsLatest version Feb 20, 2025
Temporal Modeling with Reversible Transformers

This article has 1 author:
1. Leonid Kulyk
This article has no evaluationsLatest version Mar 25, 2025

Listed in

Abstract

Article activity feed

Related articles

Improving Deep Learning Performance with Mixture of Experts and Sparse Activation

Fine-Tuning Transformers Efficiently: A Survey on LoRA and Its Impact

Temporal Modeling with Reversible Transformers