Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Dopamine has been suggested to encode reward-prediction-error (RPE) in reinforcement learning (RL) theory, but also shown to exhibit heterogeneous patterns depending on regions and conditions: some exhibiting ramping response to predictable reward while others only responding to reward-predicting cue, and some lacking response to unpredictable reward. It remains elusive how these heterogeneities relate to various RL algorithms suggested to be employed by animals/humans, such as RL under predictive state representation, hierarchical RL, and distributional RL. Here we show that these relations could be coherently explained if the decay of learned values (value-decay), implemented by the decay of dopamine-dependent plastic changes in the synaptic strengths, is considered. First, we show that value-decay causes ramping RPE under traditional non-predictive representations but not under the predictive successor representation (SR). This explained the observed gradual fading of dopamine ramping over repeated reward navigation by gradual formation of SR. Next, we constructed a hierarchical RL model that coupled two systems with and without value-decay. The model explained the observed distinct patterns of neuronal activities in parallel striatal-dopamine circuits and their suggested distinct functions in flexible learning versus stable habit. Lastly, we examined two distinct algorithms of distributional RL with and without value-decay. These algorithms explained how the reported distinct dopamine patterns in different striatal regions are linked to the suggested different strengths of distributional coding in these regions. These results suggest that within-striatum differences, or more specifically, medial-lateral gradient in value/synaptic-decay tunes regionally different RL computations through generating distinct patterns of DA/RPE.

Article activity feed