Now or later: A reinforcement learning model of behavioural delay

Sahiti Chebolu
Peiyuan Zhang
Wei Ji Ma
Peter Dayan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

(Other) people notoriously delay the initiation and completion of work, sometimes beyond what is optimal. One prominent, and indeed experimentally validated, explanation is based on the fact that rewards delivered in the future are discounted. However, other factors can interact with discounting and affect policies, such as the amount of effort and the probability of successful completion. These have received less empirical attention. Here, we build and fit a new reinforcement learning model to the working trajectories of students over the course of a semester in a real-world task (P. Y. Zhang & Ma, 2024). We show that discounting, effort, and efficacy are all important in explaining students’ delays. In addition, the discount factors inferred from task performance correlate significantly with self-reported measures of impulsivity and procrastination, as well as discount rates estimated from a monetary delay discounting task, highlighting that they robustly capture meaningful individual differences in temporal preferences.

Version published to 10.31234/osf.io/a6nz8_v1 on OSF Preprints
Apr 19, 2026

Biased processing of multiple outcomes in human reinforcement learning: evidence from computational modeling and eye-tracking

This article has 6 authors:
1. Henri Vandendriessche
2. Gruson Charlotte
3. Antonios Nasioulas
4. Camille Straboni
5. Maël Lebreton
6. Stefano Palminteri
This article has no evaluationsLatest version Apr 11, 2026
Value-based control over memory encoding and search

This article has 3 authors:
1. Sven Wientjes
2. Clay B. Holroyd
3. Sean Matthew Polyn
This article has no evaluationsLatest version Apr 18, 2026
The blessing and curse of Value-Shaping imitation

This article has 3 authors:
1. Isabelle Hoxha
2. Alice Milford Asseo
3. Stefano Palminteri
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Biased processing of multiple outcomes in human reinforcement learning: evidence from computational modeling and eye-tracking

Value-based control over memory encoding and search

The blessing and curse of Value-Shaping imitation