A Reproducible Study and Performance Analysis of GPU Programming Paradigms: OpenACC vs. CUDA in Key Linear Algebra Computations

Ezhilmathi Krishnasamy
Pascal Bouvry

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Scientific and engineering problems are frequently governed by partial differential equations; however, the analytical solutions of these equations are often impractical, thereby forcing the adoption of numerical methods. Basic Linear Algebra Subprograms (BLAS) operations constitute a fundamental component of these numerical approaches, incorporating essential tasks such as Level 1 operations (dot products and vector addition), Level 2 operations (matrix-vector multiplication), and Level 3 operations (matrix-matrix multiplication). Graphics Processing Units (GPUs), particularly those produced by NVIDIA, have gained significant computational power and are extensively employed to tackle a variety of numerical challenges. Nevertheless, substantial obstacles remain in targeting diverse GPU architectures, particularly concerning portability, the reduction of workarounds, and the enhancement of performance. This study utilizes directive-based programming languages, such as OpenACC, to effectively exploit GPU capabilities. We undertake a comprehensive comparative study and performance evaluation of the OpenACC programming model in comparison to CUDA in executing essential BLAS routines.

Version published to 10.21203/rs.3.rs-5657196/v1 on Research Square
Dec 18, 2024

Racing to Idle: Energy Efficiency of Matrix Multiplication on Heterogeneous CPU and GPU Architectures

This article has 1 author:
1. Mufakir Qamar Ansari
This article has no evaluationsLatest version Oct 21, 2025
Towards a GPU-enabled billionare SVD in pyLOM

This article has 6 authors:
1. Arnau Miró
2. Benet Eiximeno
3. Lucas Gasparino
4. Nathan Kutz
5. Ivette Rodriguez
6. Oriol Lehmkuhl
This article has no evaluationsLatest version Oct 10, 2025
A comprehensive evaluation of spatial co-execution on GPUs using MPS and MIG technologies

This article has 4 authors:
1. Jorge Villarrubia
2. Luis Costero
3. Francisco D. Igual
4. Katzalin Olcoz
This article has no evaluationsLatest version Nov 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Racing to Idle: Energy Efficiency of Matrix Multiplication on Heterogeneous CPU and GPU Architectures

Towards a GPU-enabled billionare SVD in pyLOM

A comprehensive evaluation of spatial co-execution on GPUs using MPS and MIG technologies