A hardwired neural circuit for temporal difference learning

Malcolm G Campbell
Yongsoo Ra
Zhiqin Chen
Shudi Xu
Mark Burrell
Sara Matias
Mitsuko Watabe-Uchida
Naoshige Uchida

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The neurotransmitter dopamine plays a major role in learning by acting as a teaching signal to update the brain's predictions about rewards. A leading theory proposes that this process is analogous to a reinforcement learning algorithm called temporal difference (TD) learning, and that dopamine acts as the error term within the TD algorithm (TD error). Although many studies have demonstrated similarities between dopamine activity and TD errors, the mechanistic basis for dopaminergic TD learning remains unknown. Here, we combined large-scale neural recordings with patterned optogenetic stimulation to examine whether and how the key steps in TD learning are accomplished by the circuitry connecting dopamine neurons and their targets. Replacing natural rewards with optogenetic stimulation of dopamine axons in the nucleus accumbens (NAc) in a classical conditioning task gradually generated TD error-like activity patterns in dopamine neurons by specifically modifying the task-related activity of NAc neurons expressing the D1 dopamine receptor (D1 neurons). In turn, patterned optogenetic stimulation of NAc D1 neurons in naïve animals drove dopamine neuron spiking according to the TD error of the stimulation pattern, indicating that TD computations are hardwired into this circuit. The transformation from D1 neurons to dopamine neurons could be described by a biphasic linear filter, with a rapid positive and delayed negative phase, that effectively computes a temporal difference. This finding suggests that the time horizon over which the TD algorithm operates—the temporal discount factor—is set by the balance of the positive and negative components of the linear filter, pointing to a circuit-level mechanism for temporal discounting. These results provide a new conceptual framework for understanding how the computations and parameters governing animal learning arise from neurobiological components.

Version published to 10.1101/2025.09.18.677203 on bioRxiv
Sep 18, 2025

Rapidly Reconfigurable Dynamic Computing in Neural Networks with Fixed Synaptic Connectivity

This article has 5 authors:
1. Kai Mason
2. Sonia Sennik
3. Claudia Clopath
4. Aaron Gruber
5. Wilten Nicola
This article has no evaluationsLatest version Oct 6, 2025
The human cerebellum encodes temporally sensitive reinforcement learning signals

This article has 3 authors:
1. Juliana E. Trach
2. Yiran Ou
3. Samuel D. McDougle
This article has no evaluationsLatest version Sep 7, 2025
Learning neural dynamics through instructive signals

This article has 3 authors:
1. Rich Pang
2. Juncal Arbelaiz
3. Jonathan W Pillow
This article has no evaluationsLatest version Sep 3, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Rapidly Reconfigurable Dynamic Computing in Neural Networks with Fixed Synaptic Connectivity

The human cerebellum encodes temporally sensitive reinforcement learning signals

Learning neural dynamics through instructive signals