Delayed reward information is underweighted in reinforcement learning with dispersed feedback
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Learning is fundamental to adaptive behavior. In the typical learning task, each action is associated with only one outcome, which could be immediate or delayed. However, actions often have multiple consequences that unfold over time. Here, we used behavioral and eye-tracking experiments to study how people learn when their choices yield both immediate and delayed reward information. Importantly, the rewards themselves were all delivered at the end of the study so there was no reason to weight immediate and delayed reward information differently. Instead, we found that our subjects overweighted immediate reward information. Moreover, this bias increased over the course of the experiment and was still present when learning from others’ choices. The gaze data reveal mixed evidence that subjects looked more at immediate vs. delayed feedback, and across subjects, the relative dwell proportion did not predict the behavioral bias. Our results indicate that people prioritize not just immediate rewards, but immediate reward information. Unlike temporal discounting, this form of impatience is a clear mistake and leads to objectively worse outcomes.