Characterization of high-resolution AI data center training workloads on single and multiple GPU nodes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid advancement of Artificial Intelligence (AI) is driving unprecedented computational demands, posing significant challenges to datacenter infrastructure and threatening the stability and resilience of modern power grids. This study presents an open-access dataset featuring a diverse set of AI training sessions recorded at sub-second resolution, designed to advance research on the energy consumption profiles of AI workloads and their interactions with power grid dynamics in datacenter environments. The dataset contains 32 training sessions on high-performance H100 and B200 8-GPU nodes and 40 sessions on consumer-grade NVIDIA GeForce RTX 3060 GPUs, encompassing over 1.8 million samples. Each session records power demand, CPU and GPU utilization, per-GPU power, memory usage, and temperature across diverse AI tasks, including forecasting, classification, reinforcement learning, and text and image generation. Data quality was verified through detailed technical validation, including timing accuracy, hardware limit conformance, and cross-metric correlation analysis. Measurements remained within manufacturer-specified thermal and power envelopes, and observed correlations among power, utilization, temperature, and current were consistent with established processor and GPU behavior. The dataset provides a robust foundation for modeling AI datacenter energy behavior, system-level performance analysis, and power grid connection impact assessment studies.

Article activity feed