Achieving Scale-Invariant Reinforcement Learning Performance with Reward Range Normalization

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The performance of standard reinforcement learning (RL) algorithms depends on the scale of the rewards they aim to maximize. Various strategies have been proposed to tackle this issue, but these usually restrict the algorithms' ability to adapt to varying task conditions and evolve in open-ended environments. Inspired by human cognitive processes, we propose leveraging a well-known cognitive bias to develop scale-invariant RL algorithms: reward range normalization. We compare a classical RL algorithm to one leveraging reward range-adaptation, finding that while the standard model’s accuracy is limited to certain reward magnitudes, the range-adapted model maintains consistent performance across all magnitudes. We next show that range-adaptation acts as autonomously adjusting the exploration rate, keeping it close to optimal for any given reward scale. Further, we show that the range-adapted model extends effectively to more complex, noisy, dynamic, and multi-step tasks.

Article activity feed