Toward a GPU-enabled billionaire SVD in pyLOM
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We develop and implement an accelerated high-performance and open-source computing environment for model order reduction in fluid dynamics called pyLOM. It contains singular value decomposition-based algorithms implemented for massively parallel GPU architectures. The library is profiled in detail under the MareNostrum V supercomputer. The largest case has been computed under 20 s with 100 GPUs and consisted of a billion nodes by a thousand snapshots matrix. A hybrid CPU-GPU parallel randomized QR factorization has been found to be able to leverage such large matrices. The largest speedup factor of 83 has been found on the QR factorization, while the matrix–matrix multiplication has shown a speedup factor of about 2. Additionally, two examples of application are provided in the flow around a cylinder and the Windsor body, whose POD is computed under 3 s with 100 GPUs. This showcases the efficiency of GPUs, resulting in a 97% reduction in energy to solution and a reduction of 0.11 kg of C O 2 emissions. The scalability and efficiency achieved suggest that this framework can play a key role in reducing the energy demands and environmental impact of large-scale data analysis and model order reduction across a wide range of applications.