A novel critic signal in identified midbrain dopaminergic neurons of mice training in operant tasks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In the canonical interpretation of dopaminergic neuron activity during Pavlovian conditioning, initially cell firing is triggered by unexpected rewards. Upon learning, activation instead follows the reward-predictive conditioned stimulus, and when expected rewards are withheld, firing is inhibited. However, little is known about dopaminergic neuron activity during the actual learning process in complex operant tasks. Here, we recorded optogenetically identified dopaminergic neurons of ventral tegmental area (VTA) in mice training in multiple, successive operant sensory discrimination tasks. A delay between nose-poke choices and trial outcome signals (for reward or punishment) probed for predictive activity. During training, but prior to criterion performance, firing rates signaled correct versus incorrect choices, but prior to outcome signals. Thus, the neurons predicted whether choices would be rewarded, despite the animals’ subthreshold behavioral performance. Surprisingly, these neurons also fired after reward delivery, as if the rewards had been unexpected according to the canonical view, but activity was inhibited after punishment signals, as if the reward had been expected after all. These inconsistencies suggest revision of theoretical formulations of dopaminergic neuronal activity to embody multiple roles in temporal difference learning and actor-critic models. Furthermore, on training trials when these neurons predicted that a given choice was correct and would be rewarded, surprisingly, the mice adhered to other non-rewarded and untrained task strategies (e.g., spatial alternation). The DA neurons’ reward prediction activity could serve as critic signals for the choices just made. This consistent with the notion that the brain must reconcile multiple Bayesian belief representations during learning.
Significance statement
The canonical view of dopaminergic function based on classical conditioning studies evokes reward-prediction error (RPE) signaling. Here, in mice performing a series of novel operant tasks with a delay between behavioral responses and reward/punishment signals, some neurons fired differentially after correct vs incorrect responses, but prior to the trial outcome (reward/punishment) signal. Nevertheless, the animals performed at chance levels, employing behavioral strategies other than the one signaled by these neurons. Furthermore, these same neurons showed canonical RPE responses, increased firing after reward signals (typically interpreted as the reward being unexpected) and firing rate decreased with punishment signals (interpreted as the reward having been expected). These findings indicate that dopaminergic neurons can participate in diverse functions underlying learning different behavioral strategies.