A unified derivative-like dopaminergic computation across valences
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Dopamine activity in the brain affects decision-making and adaptive behaviors. A wealth of studies indicate that dopamine activity encodes discrepancy between actual and predicted reward, leading to the reward prediction error (RPE) hypothesis. Specifically, it has been claimed that mesolimbic dopamine activity conforms to temporal-difference reward prediction error (TD RPE), a teaching signal in machine learning algorithms. Recently, there is growing evidence suggesting that dopamine is also involved in learning during aversive situations. However, the fundamental computation of dopamine activity in aversive situations is still unknown. A plausible but untested hypothesis is that dopamine activity in aversive situations also encodes TD RPE. Here, we tested this hypothesis by using mice in virtual reality. Mice were trained to avoid electrical tail shocks by running out of a virtual shock zone. Using probe conditions with speed manipulation or teleportation, we revealed that the dopamine signal in the ventral striatum follows the temporal derivative form of a value function. Delivering a reward at the end of the track enabled us to observe the integration of aversion and reward in a derivative form. Moreover, the value functions unbiasedly estimated from the recorded signal is consistent with the initial hypothetical form, with a realistic reflection of a received shock distribution. Taken together, our results show that mesolimbic dopamine activity can operate as a unified teaching signal in natural situations with positive and negative valences.