Information Uncertainty Influences Learning Strategy from Sequentially Delayed Rewards

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

When receiving a reward after a sequence of multiple events, how do we determine which event caused the reward? This problem, known as temporal credit assignment, can be difficult for human solutions given a complex and uncertain environment. It's not clear whether people adjust their strategies to tackle this problem based on the uncertainty of the environment. To address this, we adapted a reward learning task that creates a temporal credit problem combining sequentially delayed rewards, intervening events, and varying uncertainty via the amount of information presented during feedback. Using computational modeling, two learning strategies were developed: eligibility trace whereby previously selected actions are updated as a function of the temporal sequence - and tabular update - whereby only systematically-related past actions (rather than unrelated intervening events) are updated. We hypothesized that reduced uncertainty would correlate with increased use of the tabular strategy, considering the model's capacity to incorporate additional feedback information. Our results supported this hypothesis. Both models effectively learned the task, and choices made by participants (N=142) were best explained by a hybrid model that combined both strategies. However, the tabular model outperformed under conditions of low uncertainty, as evidenced by more accurate predictions of participants' behavior and an increased tabular weight parameter. These findings provide new insights into the mechanisms implemented by humans to solve temporal credit assignment and how they adapt their strategy to the uncertainty and observability of the environment.

Article activity feed