A hardwired neural circuit for temporal difference learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The neurotransmitter dopamine plays a major role in learning by acting as a teaching signal to update the brain's predictions about rewards. A leading theory proposes that this process is analogous to a reinforcement learning algorithm called temporal difference (TD) learning, and that dopamine acts as the error term within the TD algorithm (TD error). Although many studies have demonstrated similarities between dopamine activity and TD errors, the mechanistic basis for dopaminergic TD learning remains unknown. Here, we combined large-scale neural recordings with patterned optogenetic stimulation to examine whether and how the key steps in TD learning are accomplished by the circuitry connecting dopamine neurons and their targets. Replacing natural rewards with optogenetic stimulation of dopamine axons in the nucleus accumbens (NAc) in a classical conditioning task gradually generated TD error-like activity patterns in dopamine neurons by specifically modifying the task-related activity of NAc neurons expressing the D1 dopamine receptor (D1 neurons). In turn, patterned optogenetic stimulation of NAc D1 neurons in naïve animals drove dopamine neuron spiking according to the TD error of the stimulation pattern, indicating that TD computations are hardwired into this circuit. The transformation from D1 neurons to dopamine neurons could be described by a biphasic linear filter, with a rapid positive and delayed negative phase, that effectively computes a temporal difference. This finding suggests that the time horizon over which the TD algorithm operates—the temporal discount factor—is set by the balance of the positive and negative components of the linear filter, pointing to a circuit-level mechanism for temporal discounting. These results provide a new conceptual framework for understanding how the computations and parameters governing animal learning arise from neurobiological components.

Article activity feed