Towards a GPU-enabled billionare SVD in pyLOM

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We develop and implement an accelerated high-performance and open-source computing environment for model order reduction in fluid dynamics called pyLOM. It contains the algorithms of proper orthogonal decomposition, dynamic mode decomposition and spectral proper order decomposition that are based on parallel GPU-accelerated algorithms. The library is profiled in detail under the MareNostrum V supercomputer. The largest case run has been of a billion nodes with a thousand snapshots, computed under 20 seconds with 100 GPUs. While the studied applications are memory bound, a hybrid parallel randomized QR factorization has been found to be able to leverage such large matrices. The largest speedup factor of 83 has been found on the QR factorization, while the matrix--matrix multiplication has shown a speedup factor of about 2. Additionally, two examples of application are provided in the flow around a cylinder at $Re_D=10^4$ and the Windsor body at a Reynolds number of $Re_L = 2.9\times10^6$. The largest dataset of the Windsor body consists of 422 snapshots on a grid of 1.4 billion nodes, and its POD is computed under 3 seconds with 100 GPUs. This showcases the efficiency of GPUs, resulting in a 97\% reduction in energy to solution and a reduction of 0.11 kg of $CO_2$ emissions. The scalability and efficiency achieved suggest that this framework can play a key role in reducing the energy demands and environmental impact of large-scale data analysis and model order reduction across a wide range of applications.

Article activity feed