RL or not RL? Parsing the processes that support human reward-based learning.
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
{Reinforcement Learning (RL) algorithms have had tremendous success accounting for reward-based learning across species, in both behavior and brain. In particular, simple model-free RL models, such as delta-rule or Q-learning, are routinely used to model instrumental learning in bandit tasks, and they capture variance in brain signals. However, reward-based learning in humans recruits multiple processes, including high-level processes such as memory and low-level ones such as choice perseveration; their contributions can easily be mistakenly attributed to RL computations. Here, we investigate how much of RL-like behavior is supported by RL computations in a context where other processes can be factored out. Re-analysis and computational modeling of seven data sets spanning hundreds of participants show that in this instrumental context, reward-based learning is best explained by a combination of working memory and a habit-like associative process, with no RL-like value-based incremental learning. Simulations show that this combination nevertheless approximates the adaptive policy of a value-based RL agent, explaining why RL computations are mistakenly inferred when working memory is not parsed out. Our results raise important questions for the interpretation of RL as a meaningful process across brain and behavior, and call for a reconsideration of how we interpret findings in reinforcement learning across levels of analysis.