The effect of reward expectancy on different types of exploration in human reinforcement learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

How humans resolve exploit-explore dilemma in complex environment is an important open question. Previous studies suggest that the level of reward expectancy affects the degree of exploration. However, it is still unclear (1) if the effect differs depending on the type of exploration (i.e., random or directed exploration) and (2) whether the effect can really be attributed to reward expectancy. In this preregistered study, we aimed to tackle these two challenges by extending a recently developed multi-armed bandit task that can dissociate uncertainty and novelty of stimuli. To extract the purified effect of reward expectancy, we manipulated reward by its magnitude, not by its probability, across blocks, because reward probability affects controllability of outcomes. Participants ( n  = 198) showed increased optimal choices when relative expectancy was high. Behavioral analysis with computational modeling revealed that higher reward expectancy reduced the degree of random exploration, while it had little effect on the degree of uncertainty- and novelty-based exploration. These results suggest that humans modulate the degree of random exploration depending on the relative level of reward expectancy of the environment, while, combined with findings in the previous studies, they indicate the possibility that controllability also influences exploration-exploitation balance in human reinforcement learning.

Article activity feed