Learning of probabilistic punishment as a model of anxiety produces changes in action but not punisher encoding in the dmPFC and VTA

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Punishment is key form of learning and behavior change, yet its core behavioral and brain mechanisms remain poorly understood and certainly less understood relative to reward learning. This manuscript uses dual fiber photometry to make an important advance in understanding how punishment is learned by studying how punishment changes action and punisher coding in the medial prefrontal cortex and ventral tegmental area of rats. The authors interpret the results as supporting a role for both areas in foraging in the face of risky outcomes. This work follows nicely on prior work and presents a straightforward and interesting experiment, using a validated anxiolytic to test what components of the neural response are related to this emotional component.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Previously, we developed a novel model for anxiety during motivated behavior by training rats to perform a task where actions executed to obtain a reward were probabilistically punished and observed that after learning, neuronal activity in the ventral tegmental area (VTA) and dorsomedial prefrontal cortex (dmPFC) represent the relationship between action and punishment risk (Park and Moghaddam, 2017). Here, we used male and female rats to expand on the previous work by focusing on neural changes in the dmPFC and VTA that were associated with the learning of probabilistic punishment, and anxiolytic treatment with diazepam after learning. We find that adaptive neural responses of dmPFC and VTA during the learning of anxiogenic contingencies are independent from the punisher experience and occur primarily during the peri-action and reward period. Our results also identify peri-action ramping of VTA neural calcium activity, and VTA-dmPFC correlated activity, as potential markers for the anxiolytic properties of diazepam.

Article activity feed

  1. Author Response:

    Reviewer 2 (Public Review):

    Weaknesses

    1. I had difficulty following the ANOVA results for Figure 1. I assume ANOVA was performed with factors of session and block. However, a single F statistic is reported. I do not know what this is referring to. It would be more appropriate to either perform repeated measures ANOVA with session and block as factors for each dependent variable or even better, multiple analyses of variance for the three dependent measures concurrently. Then report the univariate ANOVA results for each dependent measure. The graphs in Figure 1 (C-E) suggest a main effect of block, but as reported, I cannot tell if this is the case. Further, why was sex not included as an ANOVA factor?

    For the sake of transparency, we had included plots showing sessions split by each block whereas statistics related to the right side bar plots where data are collapsed across risk (which was done to minimize effects from ‘missing’ data). We appreciate that this may have caused confusion. In the revised manuscript we specify the exact figure for each statistical result and have added a better description in the methods and updated the statistics (Table 1) with the ANOVA and post-hoc results.

    Previously we had used a mixed effects model because one subject did not complete any risk trials in session 3 but in the revised manuscript, we decided to remove that subjects’ sessions to permit RM ANOVA. As requested by the reviewer, we performed a multivariate analysis on risk and no risk trials. Because of the repeated measures design we opted to utilize the multiple RM package developed by Friedrich et al. 2019, which permits multivariate analysis on repeated measures data with minimal assumptions and bootstrapped p-values for small sample sizes. We found significant interactions for session (or treatment) and risk (see tables below). This justifies the two-way univariate ANOVA which is now reported in the rest of the manuscript. Sex differences were not included in the ANOVA because the study was not intended to assess sex differences but, rather, was designed according to NIH requirements for inclusion of males and females.

    Note: MATS test was utilized based on author recommendations in Friedrich et al., (2019) for when test violates singularity, which was reported. To replicate use a random seed of 8675309.

    Package link: https://rdrr.io/github/smn74/MANOVA.RM/man/multRM.html Publication: Friedrich, S., Konietschke, F., & Pauly, M. (2019). Resampling-based analysis of multivariate data and repeated measures designs with the R package MANOVA. RM. R J., 11(2), 380.

    1. The authors describe session 1 as characterized by 'overgeneralization' due to increased reward latencies. I do not follow this logic. Generalization typically refers to a situation in which a response to one action or cue extends to a second, similar action or cue. In the authors' design, there is only one cue and one action. I do not see how generalization is relevant here.

    This wording has been changed to “non-specific” in the results and discussion.

    1. The authors consistently report dmPFC and VTA 'neural activity'. The authors did not record neural activity. The authors recorded changes in fluorescence due to calcium influx into neurons. Even if these changes have similar properties to neural activity measured with single-unit recording, the authors did not record neural activity in this manuscript.

    We do not imply that we recorded unit activity in these studies and state in the introduction that fiber photometry is an indirect measure of neural activity. We have also reworded much of the text in the manuscript to use “calcium activity.”

    1. Comparing the patterns in Figures 2 and 3, it appears that dmPFC change in fluorescence was similar in non-shocked and shock trials up until shock delivery. However, the VTA patterns differ. No cue differences were observed for sessions 1-3 on shock trials (Figure 3A), yet differences were observed on non-shocked trials (Figure 2F). Further, changes in fluorescence between sessions 1 and 2/3 appeared to emerge just as foot shock would have been delivered. A split should be evident in Figure 3B - but it is not. Were these differences caused by sampling issues due to foot shock trials being rarer?

    We agree, although some of this could be because footshock trials were collapsed across blocks 2 and 3 (as no differences in shock response was observed between blocks). Nevertheless, we have added a caveat about cue responses to the results (see page 11-bottom and 15-top). Regarding the lack of a split in Figure 3A – this difference may be due to shock onset time. The permutation tests indicate the differences in action activity in Figure 2B emerge about 0.5 – 0.8 seconds after action which is when the shock begins. So it is not surprising the results in 2F do not match well with 3A given the rapid and robust response to the footshock.

    1. Similar to Figure 1, I could not follow the ANOVA results for the effects of diazepam treatment on trials completed, action latency and reward latency (Figure 4). Related, from what session do the bar plot data in Figure 4B come from? Is it the average of the 6% and 10% blocks? I cannot tell.

    Please see our response in comment 1 for relevant analysis to this comment. Yes average of risk blocks is the average of 6 and 10%. Better explanation of what bar plot data represent are now explained in the methods.

    1. For the diazepam experiment, did all rats receive saline and diazepam injections in separate sessions? If so, were these sessions counterbalanced? And further, how did the session numbers relate to sessions 1-3 of the first study? All of these details are extremely relevant to interpreting the results and comparing them to the first study, as session # appeared to be an important factor. For example - the decrease in dmPFC fluorescence to reward during the No-Risk block appeared to better match the fluorescent pattern seen in sessions 1 and 2 of the first experiment. In which case, the saline vs. diazepam difference was due to saline rats not showing the expected pattern of fluorescence.

    Subjects received saline and diazepam in separate sessions. Furthermore, diazepam was not tested until animals had at least 3 sessions of training (range of sessions 4-8). Clarification has been added to the methods.

    The new AUC analysis for Reviewer 1 allows for better assessment of the potential differences between earlier sessions and saline (see figure 7- supplements 2 and 3). We also found the effect with reward and diazepam perplexing and somewhat modest. However, even after comparing only Saline and Session 3 PFC AUC data we found no significant effect of session or session*risk interaction for action or reward (F values < 1.3, p values >.27).

    1. The authors seem convinced that fiber photometry is a surrogate for neural activity. Although significant correlation coefficients are found during action and reward, these values hover around 0.6 for the dmPFC and 0.75 for the VTA. Further, no correlations are observed for cue periods. A strength of the calcium imaging approach is that it permits the monitoring of specific neural populations. This would have been very valuable for the VTA, in which dopamine and GABA neurons must show very different patterns of activity. Opting for fiber photometry and then using a pan-neuronal approach fails to leverage the strength of the approach.

    The parent paper (Park & Moghaddam, 2017) used unit recording in this task (including reporting data from dopamine and non-dopamine VTA units). We assure the reviewer that we do not claim that fiber photometry is a perfect surrogate for direct recording of neural activity. However, a key question we wanted to answer in this study was whether the response of PFC and VTA to the footshock changes during task acquisition (please see last paragraph of introduction), hence the choice to use fiber photometry. We note in the results and discussion that this approach is not optimal for detecting cue or other rapid responses (see page 15 and 23).

    Reviewer 3 (Public Review):

    Probably the biggest overall issue is that it is unclear what is being learned specifically. There is no probe test at the end to dissociate the direct impact of shock from its learned impact. And the blocks are not signaled in some other way. And though there seems to be some evidence that the shock effects get more pronounced with a session, it is not clear if the rats are really learning to associate specific shock risks with the particular trials. Indeed with so few sessions and so few actual shocks, this seems really unlikely, especially since without an independent cue, the shock and its frequency is the cue for the block switch. It seems especially unlikely that there is a strong dichotomy in the rats model of the environment between 6% and 10% blocks. This may be quite relevant for understanding foraging under risk. But I think it means some of the language in the paper about contingencies and the like should be avoided.

    While the parent paper (Park & Moghaddam, 2017) delved more deeply into this question we agree that what exactly is learned may be difficult to ascertain. To address this (please also see response to reviewer #1’s first comment), we have toned down our use of the “contingency learning” throughout the manuscript and use the word contingency in relation to the underlying reinforcement/punishment schedules.

    The second issue I had was that I had some trouble lining up the claims in the results with what appeared to be meaningful differences in the figures. Just looking at it, it seems to me that VTA shows higher activities at higher shocks, particularly at the time of reward but also when comparing safe vs risky anyway for the cue and action periods. DmPFC shows a similar pattern in the reward period. […] But these results are not described at all like this. The focus is on the action period only and on ramping? I don't really see ramping. it says "Anxiogenic contingencies also did not influence the phasic response to reward...". But fig 3 seems to show clearly different reward responses? The characterization of the change is particularly important since to me it looks like the diazepam essentially normalizes these features of the response. This makes sense to me […].

    We initially believed that much of the differences in reward (with the exception of Session 2 in the PFC) were from carryover of differences in the peri-action period. However upon quantifying these responses again using AUC change scores to adjust for pre-event differences in the signal, we observed small reward related increases (data are in Figure 7 – supplements 2/3) and have updated results and the discussion.

    Although some lessening of reward response may be apparent across the diazepam session in the VTA (Figure 7 – supplement 2/3G), we do not have statistical support for this as no significant differences were observed in permutation comparisons to saline and only session 3 deviated from the first session for the reward period in the AUC analyses.

  2. Evaluation Summary:

    Punishment is key form of learning and behavior change, yet its core behavioral and brain mechanisms remain poorly understood and certainly less understood relative to reward learning. This manuscript uses dual fiber photometry to make an important advance in understanding how punishment is learned by studying how punishment changes action and punisher coding in the medial prefrontal cortex and ventral tegmental area of rats. The authors interpret the results as supporting a role for both areas in foraging in the face of risky outcomes. This work follows nicely on prior work and presents a straightforward and interesting experiment, using a validated anxiolytic to test what components of the neural response are related to this emotional component.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

  3. Reviewer #3 (Public Review):

    Punishment is a key form of learning and behavior change, yet its core behavioural and brain mechanisms remain poorly understood and certainly less well understood than reward learning. This manuscript by Jacobs et al from the Moghaddam laboratory uses dual fibre photometry for calcium transients to make an important advance in understanding how punishment is learned by studying how punishment changes action and punisher coding in the PFC and VTA of rats. This work builds on the elegant single unit work from this group reported previously. The authors use a single action, probabilistic task whereby rats are first trained to nosepoke for sugar pellets on an FR1, with a 5 sec DS signalling reinforcement. Then, in blocks of 30 trials each, the nosepoke is punished on a probabilistic contingency of 0%, 6%, 10%. The authors used dual fibre photometry to concurrently record calcium transients in "dmPFC" and VTA, with a focus on transients related to action emission and punisher as well as reward delivery.

    There are quite a few key findings here: 1) action transients in dmPFC change across punishment from modest inhibitory transients in 0% risk to no change (i.e possible loss of inhibitory transient in PFC) or modest positive transients (in VTA) as risk increased from 6-10%; 2) comparison with past single-unit data suggested similarity between photometry and single unit measures for the action but not DS; 3) there was no change in punisher transients in these regions; 4) diazepam which had modest behavioral effects to alleviate punishment had no effects on PFC transient to the action or punisher but did reveal peri-action ramping-like transients in VTA; 5) diazepam increased correlated activity between VTA and PFC at 0% and 6% risk

    Overall, I enjoyed reading this manuscript and I learned much from it. The work builds neatly and clearly on the past work of this group in this task, providing new information on how punishment shapes action coding in the prefrontal cortex and VTA, how it shapes correlated activity between these regions, and how benzodiazepines may affect these to achieve their anxiolytic effects. The critical conclusions are that these regions are important for action, but not punisher, encoding, and that peri-action ramping in VTA neurons and VTA-PFC correlated activity contribute to the anxiolytic effects of benzodiazepines in this task.

    Comments

    1. I think it is worth drawing the distinction between punishment (i.e. learning and performance) versus the punisher (footshock). For example, the title (and across the manuscript) refers to "punishment coding" to mean transients to the punisher itself. I would suggest using "punisher" when referring to the outcome used (footshock) or its associated transients and "punishment" when referring to learning. So, learning punishment involves changes in action but not punisher encoding in these regions.

    2. "dmPFC". Different researchers mean different things by this term. Would it be possible to state exactly where the fibres were instead (e.g., Laubach et al., eNeuro, 2018)?

    3. I did struggle to understand the functional significance of the PFC transients. I am convinced they are real and robust because we see precisely the same in our own unpublished work. But, I am still puzzled as to what a loss of an 'inhibitory' transient around the punished action in PFC means? This is not really addressed but it is the main effect of punishment on action coding in the PFC and I think some readers would appreciate the author's interpretation of this.

    4. Related to 3, it was also not clear why these PFC transients differed only at 6% risk and not also 10% risk. Again, I think this is worth discussing.

    5. Re: analyses. I thought these were generally well done. There are two questions one might be interested in. The first is whether the transients are different from 0%. The second is whether transients differ across sessions. The figures do a good job at answering the second question (which to me is the most important question) by using coloured bars above transients to show when session differences are present as assessed by a robust analysis. However, I do think some readers would also appreciate knowing whether and when transients themselves were significantly < or > 0%. Perhaps these figures could be presented as supplementary data.

    6. The comparison with previously published single-unit data was very interesting. Here I was persuaded that these correlations were meaningful because of the difference between these correlations for cue and action. I am not suggesting the authors do the following, I only offer it for their consideration in future work. Kriegeskorte has developed ways of assessing dissimilarity in different data types from the same behavioural designs that could prove very helpful and persuasive here (e.g., Front. Syst. Neurosci., 24 November 2008; https://doi.org/10.3389/neuro.06.004.2008).

    7. The authors comment on the overgeneralisation of punishment learning. That is, in session 1 there is a broad suppression of behavior by punishment that was not obviously present in the remaining sessions. I am not sure overgeneralisation is the best term because this implies punishment learning generalised. More likely is that Pavlovian fear was present in session 1 to generally suppress nosepoking and this fear was reduced in the remaining sessions as the instrumental punishment contingency was learned. Bolles made this point some years ago and it may be worth citing Bolles et al. Learning and Motivation Volume 11, Issue 1, February 1980, Pages 78-96, on this point.

  4. Reviewer #2 (Public Review):

    The authors combined fiber photometry measurement calcium transients in dorsomedial prefrontal cortex and ventral tegmental area with diazepam treatment in a task assessing risk behavior in rats. They observed risk-related changes in calcium transients around action and reward that were altered by diazepam. Further, diazepam worked to synchronize action and reward-related transients between the two regions. The strengths and weaknesses of the authors' manuscript are addressed below:

    • Strengths
    1. The rationale for studying these two regions is clear and in support, both show clear changes in calcium transients during the risk-assessment task.
    2. Comparing fluorescence correlation in the two regions during saline and diazepam treatment was clever and striking.

    • Weaknesses
    1. I had difficulty following the ANOVA results for Figure 1. I assume ANOVA was performed with factors of session and block. However, a single F statistic is reported. I do not know what this is referring to. It would be more appropriate to either perform repeated measures ANOVA with session and block as factors for each dependent variable or even better, multiple analyses of variance for the three dependent measures concurrently. Then report the univariate ANOVA results for each dependent measure. The graphs in Figure 1 (C-E) suggest a main effect of block, but as reported, I cannot tell if this is the case. Further, why was sex not included as an ANOVA factor?
    2. The authors describe session 1 as characterized by 'overgeneralization' due to increased reward latencies. I do not follow this logic. Generalization typically refers to a situation in which a response to one action or cue extends to a second, similar action or cue. In the authors' design, there is only one cue and one action. I do not see how generalization is relevant here.
    3. The authors consistently report dmPFC and VTA 'neural activity'. The authors did not record neural activity. The authors recorded changes in fluorescence due to calcium influx into neurons. Even if these changes have similar properties to neural activity measured with single-unit recording, the authors did not record neural activity in this manuscript.
    4. Comparing the patterns in Figures 2 and 3, it appears that dmPFC change in fluorescence was similar in non-shocked and shock trials up until shock delivery. However, the VTA patterns differ. No cue differences were observed for sessions 1-3 on shock trials (Figure 3A), yet differences were observed on non-shocked trials (Figure 2F). Further, changes in fluorescence between sessions 1 and 2/3 appeared to emerge just as foot shock would have been delivered. A split should be evident in Figure 3B - but it is not. Were these differences caused by sampling issues due to foot shock trials being rarer?
    5. Similar to Figure 1, I could not follow the ANOVA results for the effects of diazepam treatment on trials completed, action latency and reward latency (Figure 4). Related, from what session do the bar plot data in Figure 4B come from? Is it the average of the 6% and 10% blocks? I cannot tell.
    6. For the diazepam experiment, did all rats receive saline and diazepam injections in separate sessions? If so, were these sessions counterbalanced? And further, how did the session numbers relate to sessions 1-3 of the first study? All of these details are extremely relevant to interpreting the results and comparing them to the first study, as session # appeared to be an important factor. For example - the decrease in dmPFC fluorescence to reward during the No-Risk block appeared to better match the fluorescent pattern seen in sessions 1 and 2 of the first experiment. In which case, the saline vs. diazepam difference was due to saline rats not showing the expected pattern of fluorescence.
    7. The authors seem convinced that fiber photometry is a surrogate for neural activity. Although significant correlation coefficients are found during action and reward, these values hover around 0.6 for the dmPFC and 0.75 for the VTA. Further, no correlations are observed for cue periods. A strength of the calcium imaging approach is that it permits the monitoring of specific neural populations. This would have been very valuable for the VTA, in which dopamine and GABA neurons must show very different patterns of activity. Opting for fiber photometry and then using a pan-neuronal approach fails to leverage the strength of the approach.

  5. Reviewer #1 (Public Review):

    In the current study, the authors used photometry to record bulk calcium signal from dmPFC and VTA in awake behaving rats performing a "punishment risk task". In this task, the rats responded on a lever for reward after a 5s cue on an FR1 schedule across three 30-trial blocks in which the risk of shock on lever press increased from 0% to 6% and then 10%. Rats were trained on the food task then recordings were made across the initial 3 sessions of training with shock. The authors show that trials completed and action latencies changed across blocks consistent with an increasing effect of shock on the behavior, and also changed across the sessions in a way that suggested some sort of learning related to punishment. Against this backdrop, they found that bulk calcium signal changed - generally increasing - with risk across blocks and also across sessions. This effect looks particularly prominent at high risk (10%) the time of reward in both areas and also to the cue and action in VTA. Pre-session administration of diazepam normalized some of the performance measures in the shock blocks and this was associated with reduction of the bulk signal in both areas where prior increases were seen (my interpretation of comparison of figures 2 and 5). Interestingly signal at the time of the few shocks was not markedly different between blocks and was not heavily impacted by diazepam. Authors interpret the results as supporting a role for both areas in foraging in the face of risky outcomes and further suggest that the plasticity (and effects of diazepam) are not related directly to punishment but instead reflect changes in the peri-action period as they term it.

    Overall I liked the paper. It follows nicely on prior work and presents a straightforward and interesting experiment, using a validated anxiolytic in the context of their task to test what components of the neural response are related to this emotional component. The results are quite interesting I think, particularly since in the 10% block where significant increases in activity seem to evolve with learning and be reversed by diazepam. That said, I have a couple of concerns to consider.

    Probably the biggest overall issue is that it is unclear what is being learned specifically. There is no probe test at the end to dissociate the direct impact of shock from its learned impact. And the blocks are not signaled in some other way. And though there seems to be some evidence that the shock effects get more pronounced with a session, it is not clear if the rats are really learning to associate specific shock risks with the particular trials. Indeed with so few sessions and so few actual shocks, this seems really unlikely, especially since without an independent cue, the shock and its frequency is the cue for the block switch. It seems especially unlikely that there is a strong dichotomy in the rats model of the environment between 6% and 10% blocks. This may be quite relevant for understanding foraging under risk. But I think it means some of the language in the paper about contingencies and the like should be avoided.

    The second issue I had was that I had some trouble lining up the claims in the results with what appeared to be meaningful differences in the figures. Just looking at it, it seems to me that VTA shows higher activities at higher shocks, particularly at the time of reward but also when comparing safe vs risky anyway for the cue and action periods. DmPFC shows a similar pattern in the reward period. This is interesting and would be consistent with sort of a contrast effect perhaps. But these results are not described at all like this. The focus is on the action period only and on ramping? I don't really see ramping. And at the top of para 3 in the discussion, it says "Anxiogenic contingencies also did not influence the phasic response to reward...". But fig 3 seems to show clearly different reward responses? The characterization of the change is particularly important since to me it looks like the diazepam essentially normalizes these features of the response. This makes sense to me, but if those are not the features of the response that are highlighted, then the diazepam data is harder to understand.

    In any event, I think with a few changes in terminology and perhaps how the data is described this looks like a valuable and important result for those interested in this sort of complex task/model.