Optimal composition of multiple value functions allows efficient, safe and stable dopamine-mediated learning

Pranav Mahajan
Ben Seymour

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The seminal reward prediction error theory of dopamine function faces several key challenges. Most notable is the difficulty learning multiple rewards simultaneously, inefficient on-policy learning, and accounting for heterogeneous striatal responses in the tail of the striatum. We propose a normative framework, based on linear reinforcement learning, that redefines dopamine’s computational objective. We propose that dopamine optimises not just cumulative rewards, but a reward value function augmented by a penalty for deviating from a default behavioural policy, which effectively confers value on controllability. Our simulations show that this single modification enables optimal value composition, fast and robust adaptation to changing priorities, safer exploration in the context of threats, and stable learning amid uncertainty. Critically, this unifies disparate striatal observations, parsimoniously reconciling threat and action prediction error signals within the striatal tail. Our framework refines the core principle governing striatal dopamine, bridging theory with neural data and offering testable predictions.

Version published to 10.1101/2025.10.10.681616 on bioRxiv
Oct 10, 2025

Inferring learning rules during de novo task learning

This article has 2 authors:
1. Victor Geadah
2. Jonathan W. Pillow
This article has no evaluationsLatest version Sep 30, 2025
Belief updating in uncertain environments are differentially sensitive to reward and punishment learning: Evidence from ERP

This article has 3 authors:
1. Lingyun Xiang
2. Meng Liu
3. Weijun Li
This article has no evaluationsLatest version Oct 10, 2025
An Information-Theoretic Framework for Understanding Learning and Choice Under Uncertainty

This article has 3 authors:
1. Jae Hyung Woo
2. Lakshana Balaji
3. Alireza Soltani
This article has no evaluationsLatest version Oct 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Inferring learning rules during de novo task learning

Belief updating in uncertain environments are differentially sensitive to reward and punishment learning: Evidence from ERP

An Information-Theoretic Framework for Understanding Learning and Choice Under Uncertainty