A Survey of Advancements in Scheduling Techniques for Efficient Deep Learning Computations on GPUs

Rupinder Kaur
Arghavan Asad
Seham Al Abdul Wahid
Farah Mohammadi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This comprehensive survey explores recent advancements in scheduling techniques for efficient deep learning computations on GPUs. The article highlights challenges related to parallel thread execution, resource utilization, and memory latency in GPUs, which can lead to suboptimal performance. The surveyed research focuses on novel scheduling policies to improve memory latency tolerance, exploit parallelism, and enhance GPU resource utilization. Additionally, it explores the integration of prefetching mechanisms, fine-grained warp scheduling, and warp switching strategies to optimize deep learning computations. These techniques demonstrate significant improvements in throughput, memory bank parallelism, and latency reduction. The insights gained from this survey can guide researchers, system designers, and practitioners in developing more efficient and powerful deep learning systems on GPUs. Furthermore, potential future research directions include advanced scheduling techniques, energy efficiency considerations, and the integration of emerging computing technologies. Through continuous advancement in scheduling techniques, the full potential of GPUs can be unlocked for a wide range of applications and domains, including GPU-accelerated deep learning, task scheduling, resource management, memory optimization, and more.

Version published to 10.3390/electronics14051048
Mar 6, 2025
Version published to 10.20944/preprints202412.0276.v2
Feb 20, 2025
Version published to 10.20944/preprints202412.0276.v1
Dec 3, 2024

Extending a Moldable Computer Architecture to Accelerate DL Inference on FPGA

This article has 6 authors:
1. Mirko Mariotti
2. Giulio Bianchini
3. Igor Neri
4. Daniele Spiga
5. Diego Ciangottini
6. Loriano Storchi
This article has no evaluationsLatest version May 27, 2025
Automatic GPU Memory Access Optimization for AoSoA-based Application in OP2 Framework

This article has 6 authors:
1. Tong Lei
2. Zongjing Chen
3. Yonggang Che
4. Chuanfu Xu
5. Zhe Dai
6. Jian Zhang
This article has no evaluationsLatest version Jun 17, 2025
Multi-Line Prefetch Covert Channel with Huge Pages

This article has 2 authors:
1. Xinyao Li
2. Akhilesh Tyagi
This article has no evaluationsLatest version Jun 2, 2025

Listed in

Abstract

Article activity feed

Related articles

Extending a Moldable Computer Architecture to Accelerate DL Inference on FPGA

Automatic GPU Memory Access Optimization for AoSoA-based Application in OP2 Framework

Multi-Line Prefetch Covert Channel with Huge Pages