EcoRL-Sched: Energy-Aware Heterogeneous GPU–FPGA Task Scheduling for Sustainable RLHF Training Pipelines
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Reinforcement Learning from Human Feedback (RLHF) has become the dominant post-training paradigm for aligning large language models (LLMs), yet it remains among the most energetically expensive workloads in modern AI infrastructure. Existing RLHF frameworks optimise primarily for throughput on homogeneous GPU clusters, neglecting the severe energy inefficiencies inherent in the multi-stage RLHF pipeline. We identify a fundamental and previously unexploited structural asymmetry: inference stages (Reward Model, Reference Policy, Critic) draw 60–75% less power per GPU than training stages, and their predictable single-pass computation maps naturally to FPGA accelerators. We present EcoRL-Sched, an energy-aware heterogeneous GPU–FPGA task scheduling framework comprising three tightly integrated innovations: (1) a power-profiling subsystem that characterises per-stage, per-model-size energy density via a novel Energy Density Index (EDI) metric; (2) an FPGA offloading engine on Xilinx Alveo U55C achieving 4.9× better tokens/Joule than H100 GPUs for reward and reference inference, running concurrently with GPU training via a latency-overlap protocol; and (3) an RL-based dynamic scheduler, a PPO-trained lightweight policy network, that uses real-time power telemetry and ROLL multi-task workloads to minimise pipeline bubbles and idle GPU cycles. Across 8B, 70B, and 405B parameter models on a 32-GPU H100 cluster, EcoRL-Sched achieves up to 14.6× throughput speedup, 38.4% energy reduction, 40.6% CO2 reduction, and 51% faster convergence on ROLL benchmarks, all without degrading model quality. Lifecycle analysis confirms net carbon benefits exceed FPGA manufacturing overhead by >30×.