Achieving Scale-Invariant Reinforcement Learning Performance with Reward Range Normalization

Maëva L'Hôtellier
Jérémy Perez
Stefano Palminteri

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The performance of standard reinforcement learning (RL) algorithms depends on the scale of the rewards they aim to maximize. Various strategies have been proposed to tackle this issue, but these usually restrict the algorithms' ability to adapt to varying task conditions and evolve in open-ended environments. Inspired by human cognitive processes, we propose leveraging a well-known cognitive bias to develop scale-invariant RL algorithms: reward range normalization. We compare a classical RL algorithm to one leveraging reward range-adaptation, finding that while the standard model’s accuracy is limited to certain reward magnitudes, the range-adapted model maintains consistent performance across all magnitudes. We next show that range-adaptation acts as autonomously adjusting the exploration rate, keeping it close to optimal for any given reward scale. Further, we show that the range-adapted model extends effectively to more complex, noisy, dynamic, and multi-step tasks.

Version published to 10.31234/osf.io/bjyr9 on OSF Preprints
Dec 5, 2024

DynamicRL: Data-Driven Estimation of Trial-by-Trial Reinforcement Learning Parameters

This article has 4 authors:
1. Hua-Dong Xiong
2. Li Ji-An
3. Marcelo G Mattar
4. Robert C Wilson
This article has no evaluationsLatest version Jun 1, 2025
Fictive Learning in Model-based Reinforcement Learning by Generalized Reward Prediction Errors

This article has 3 authors:
1. Jianning Chen
2. Masakazu Taira
3. Kenji Doya
This article has no evaluationsLatest version Jun 15, 2025
Signatures of reinforcement learning in natural behavior

This article has 3 authors:
1. Catherine A. Hartley
2. Susan L. Benear
3. Aaron S Heller
This article has no evaluationsLatest version Jun 29, 2025

Listed in

Abstract

Article activity feed

Related articles

DynamicRL: Data-Driven Estimation of Trial-by-Trial Reinforcement Learning Parameters

Fictive Learning in Model-based Reinforcement Learning by Generalized Reward Prediction Errors

Signatures of reinforcement learning in natural behavior