Racing to Idle: Energy Efficiency of Matrix Multiplication on Heterogeneous CPU and GPU Architectures
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Heterogeneous computing has emerged as an essential approach to overcoming the power and thermal constraints that have stalled single-core processor scaling. By integrating multi-core CPUs with both discrete and integrated GPUs, modern systems promise substantial gains in both performance and energy efficiency, yet the practical magnitude of these benefits on consumer hardware remains underexplored. This study presents a rigorous experimental comparison of a canonical matrix-matrix multiplication workload across three architectures, a multi-core AMD Ryzen 7 CPU, a discrete NVIDIA GeForce GPU, and an integrated AMD Radeon Vega GPU, within a single, widely available laptop. Using minimally intrusive, production-grade measurement tools, we deliver a transparent, quantitative analysis of the real-world trade-offs between speed and energy consumption. The results demonstrate that the discrete GPU not only provides a dramatic 93-fold speedup over the CPU, but also achieves more than 50 times greater energy efficiency, consuming just 2% of the energy required by the CPU for the same computation. These findings provide direct evidence for the race to idle principle: peak instantaneous power is less important than rapid workload completion and fast return to idle for minimizing total energy-to-solution. Overall, this work establishes clear empirical guidance for practitioners designing for energy-aware high-performance computing, demonstrating that architectural specialization is critical for unlocking orders-of-magnitude improvements in computational efficiency on widely accessible platforms.