Biased processing of multiple outcomes in human reinforcement learning: evidence from computational modeling and eye-tracking

Henri Vandendriessche
Gruson Charlotte
Antonios Nasioulas
Camille Straboni
Maël Lebreton
Stefano Palminteri

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In many circumstances, choices result in multiple simultaneous outcomes, all of which should be integrated to optimally update reward expectation. Yet, to date, empirical investigations of reinforcement learning have mostly focused on situations where choices deliver only one outcome at a time. To understand how humans learn from multiple outcomes, we designed a new reinforcement learning task, where the selection of an option resulted in two outcomes drawn from the same underlying distribution over potential gains and losses, even if ultimately only one of the two counted for the final payoff. Behavioral results show that the two - equally informative- outcomes are considered, yet asymmetrically as function of their valence. This behavioral observation is backed up by computational analyses showing that, on top of previously documented asymmetric update, multiple outcome integration is biased, such as it overweighs rewards over punishments. Behavioral and computational results were paralleled by eye-tracking analysis showing that attention deployment is biased by outcome valence and relevance. The main results were confirmed in a second experiment featuring complete feedback information. Overall, our findings suggest that, when options deliver multiple discordant outcomes, losses tend to be neglected compared to gains.

Version published to 10.31234/osf.io/2e58a_v1 on OSF Preprints
Apr 11, 2026

Now or later: A reinforcement learning model of behavioural delay

This article has 4 authors:
1. Sahiti Chebolu
2. Peiyuan Zhang
3. Wei Ji Ma
4. Peter Dayan
This article has no evaluationsLatest version Apr 19, 2026
Modulation of feature attention by reward prediction error explains value learning behavior

This article has 3 authors:
1. Mingze L. Leukos (李铭泽)
2. Albert Liang
3. Grace W. Lindsay
This article has no evaluationsLatest version Apr 11, 2026
Value-based control over memory encoding and search

This article has 3 authors:
1. Sven Wientjes
2. Clay B. Holroyd
3. Sean Matthew Polyn
This article has no evaluationsLatest version Apr 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Now or later: A reinforcement learning model of behavioural delay

Modulation of feature attention by reward prediction error explains value learning behavior

Value-based control over memory encoding and search