A Reproducible Study and Performance Analysis of GPU Programming Paradigms: OpenACC vs. CUDA in Key Linear Algebra Computations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Scientific and engineering problems are frequently governed by partial differential equations; however, the analytical solutions of these equations are often impractical, thereby forcing the adoption of numerical methods. Basic Linear Algebra Subprograms (BLAS) operations constitute a fundamental component of these numerical approaches, incorporating essential tasks such as Level 1 operations (dot products and vector addition), Level 2 operations (matrix-vector multiplication), and Level 3 operations (matrix-matrix multiplication). Graphics Processing Units (GPUs), particularly those produced by NVIDIA, have gained significant computational power and are extensively employed to tackle a variety of numerical challenges. Nevertheless, substantial obstacles remain in targeting diverse GPU architectures, particularly concerning portability, the reduction of workarounds, and the enhancement of performance. This study utilizes directive-based programming languages, such as OpenACC, to effectively exploit GPU capabilities. We undertake a comprehensive comparative study and performance evaluation of the OpenACC programming model in comparison to CUDA in executing essential BLAS routines.

Article activity feed