Sex differences in learning from exploration

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    Chen et al. trained male and female animals on an explore/exploit (2-armed bandit) task. Despite similar levels of accuracy in these animals, authors report higher levels of exploration in male than in female mice. The patterns of exploration were analyzed in fine-grained detail, with the addition of computational modeling: males are less likely to stop exploring once exploring is initiated, whereas females stop exploring once they learn. The results are of broad interest to those interested is sex differences in learning. Inclusion of more primary behavioral data and further justification of the models and parameters is needed to clarify data presentation and interpretation.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Sex-based modulation of cognitive processes could set the stage for individual differences in vulnerability to neuropsychiatric disorders. While value-based decision making processes in particular have been proposed to be influenced by sex differences, the overall correct performance in decision making tasks often show variable or minimal differences across sexes. Computational tools allow us to uncover latent variables that define different decision making approaches, even in animals with similar correct performance. Here, we quantify sex differences in mice in the latent variables underlying behavior in a classic value-based decision making task: a restless two-armed bandit. While male and female mice had similar accuracy, they achieved this performance via different patterns of exploration. Male mice tended to make more exploratory choices overall, largely because they appeared to get ‘stuck’ in exploration once they had started. Female mice tended to explore less but learned more quickly during exploration. Together, these results suggest that sex exerts stronger influences on decision making during periods of learning and exploration than during stable choices. Exploration during decision making is altered in people diagnosed with addictions, depression, and neurodevelopmental disabilities, pinpointing the neural mechanisms of exploration as a highly translational avenue for conferring sex-modulated vulnerability to neuropsychiatric diagnoses.

Article activity feed

  1. Author Response:

    Reviewer #1:

    Chen et al. trained male and female animals on an explore/exploit (2-armed bandit) task. Despite similar levels of accuracy in these animals, authors report higher levels of exploration in males than in females. The patterns of exploration were analyzed in fine-grained detail: males are less likely to stop exploring once exploring is initiated, whereas female mice stop exploring once they learn. Authors find that both learning rate (alpha) and noise parameter (beta) increase in exploration trials in a hidden Markov model (HMM). When reinforcement learning (RL) models were fitted to animal data, they report females had a higher learning rate and over days of testing, suggesting higher meta-learning in females. They also report that of the RL models they fit, the model incorporating a choice kernel updating rule was found to fit both male and female learning. The results do suggest one should pay greater attention to the influence of sex in learning and exploration. Another important takeaway from this study is that similar levels of accuracy do not imply similar strategies. Essential revisions include a request to show more primary behavioral data, to provide a rationale for the different RL models and their parameters, to clarify the difference between learning and 'steady state,' and to qualify how these experiments uniquely identify latent cognitive variables not previously explored with similar methods.

    We appreciate the reviewer’s thorough reading of the paper and hope that the changes we detail below will address these concerns.

    Reviewer #2:

    The authors investigated sex differences in explore-exploit tradeoff using a drifting binary bandit task in rodents. The authors tried to claim that males and females use different means to achieve similar levels of accuracy in making explore-exploit decisions. In particular, they argue that females explore less but learn more quickly during exploration. The topic is very interesting, but I am not yet convinced on the conclusions.

    Here are my major points:

    1. This paper showed that males explore more than females, and through computational modeling, they showed that females have a higher learning rate compared to males. The fact that males explore more and have lower learning rates compare to females, can be an interesting finding as the paper tried to claim, but it can also be that female rats simply learn the task better than male rats in the task used.

    We have revised the manuscript to better demonstrate that male mice did not acquire fewer rewards than females, and included all analyses and plots requested in this review. Ultimately, there was no evidence that they learned the task any less well than the females did. We appreciated this comment because it has strengthened the evidence we were able to present that males and females take different paths to the same outcome. Completing these analyses has also allowed us to clarify the relationship between RL learning rates and performance in this classic dynamic decision-making task.

    (a) First, from Figure 1B, it looks like p(reward, chance) are similar between sex, but visually the female rats' performances, p(reward, obtained), look slight better than males. It would be nice if the authors could show a bar plot comparison like in Figure 1C and 1E. A non-significant test here only fails to show sex differences in performance, but it cannot be concluded that there are no sex differences in performance here. Further evidence needs to be reported here to help readers see whether there are qualitative differences in performances at all.

    The requested bar plot has been added in as Figure 1C and illustrates our central point: male mice did not acquire fewer rewards than females, so there is no evidence that they learned the task any less well than the females did. The t-test result we originally reported suggests that we can discard the hypothesis that males and females have different mean levels of percent reward obtained, but we take the reviewer’s point that the male and female distributions may differ in other, more subtle ways. Therefore, we conducted a better statistical test here. The Kolmogorov-Smirnov (KS) test takes into account not only the means of the distributions but also the shapes of the distributions. The null hypothesis is that both groups were sampled from populations with identical distributions. It tests for any violation of that null hypothesis -- different medians, different variances, or different distributions. The KS test suggested that males and females are not just not significantly different in their reward acquisition performance (Kolmogorov-Smirnov D = 0.1875, p = 0.94), but that males and females have the same distribution of performance.

    New text from the manuscript (page 5, line 119-128):

    “There was no significant sex difference in the probability of rewards acquired above chance (Figure 1C, main effect of sex, F(1, 30) = 0.05, p = 0.83). While the mean of percent reward obtained did not differ across sexes, we consider the possibility that the distribution of reward acquisition in males and females might be different. We conducted the Kolmogorov-Smirnov (KS) test, which takes into account not only the means of the distributions but also the shapes of the distributions. The KS test suggested that males and females are not just not significantly different in their reward acquisition performance (Kolmogorov-Smirnov D = 0.1875, p = 0.94), but that males and females have the same distributions for reward acquisition. This result demonstrates equivalently strong understanding and performance of the task in both males and females.”

    (b) The exploration and exploitation states are defined by fitting a hidden Markov model. In the exploration phase, the agent chooses left and right randomly. From Figure 1E and 1F, it looks like for male rats, they choose completely randomly 70% of the times (around 50% for females). The exploration state here is confounded with the state of pure guessing (poor performance).

    This comment seems to confuse our descriptive HMM with a generative model. The HMM does not imply that choices are being made randomly. Instead, exploratory choices are modeled as a uniform distribution over choices. This was done only because this is the maximum entropy distribution for a categorical variable -- the distribution that makes the fewest assumptions about the true underlying distribution and thus does not bias the model towards or away from any particular pattern of choices during exploration. For example, (Ebitz et al., 2019) have shown that the HMM can recover periods of exploration that are highly structured and information- maximizing, despite being modeled in exactly this way.

    Because the model does not imply or require that exploratory choices are random, we could, in the future, ask whether these choices reflect random exploration or instead more directed forms of exploration. However, for various reasons, this task is not the ideal testbed for isolating random and directed exploration, though this is a direction we hope to go in the future. To clarify our model and address these issues for future research, we have added the following text (page 31, line 745-756):

    “The emissions model for the explore state was uniform across the options. The emissions model for the explore state was uniform across the options:

    This is simply the maximum entropy distribution for a categorical variable - the distribution that makes the fewest number of assumptions about the true distribution and thus does not bias the model towards or away from any particular type of high-entropy choice period. This doesn’t require, imply, impose, or exclude that decision-making happening under exploration is random. Ebitz et al. 2019 have shown that exploration was highly structured and information-maximizing, despite being modeled as a uniform distribution over choices (Ebitz et al., 2020, 2019). Because exploitation involves repeated sampling of each option, exploit states only permitted choice emissions that matched one option.”

    (c) Figure 2 basically says that you can choose randomly for two reasons, to be more "noisy" in your decisions (have a higher temperature term), or to ignore the values more (by having a learning rate of 0, you are just guessing). It would be nice to show a simulation of p(reward, obtained) by learning rate x inverse temperature (like in Figure 2C). From Figure 2B, it looks like higher learning rates means better value learning in this task. It seems to me that it's more likely the male rats simply learn the task more poorly and behave more randomly which show up as more exploration in the HMM model.

    This is an important comment and addressing it gave us a chance to show the complicated, nonlinear relationship between learning rate term and performance in this task. Per the reviewer’s request, we now include a plot showing how learning rate (ɑ) and inverse temperature (β)affect reward acquisition (Figure 3F). However, this figure demonstrates that higher learning rate does not mean better performance in this task. Performing well in this task requires both the ability to learn new information and the ability to hang onto the information that has already been learned. That can only happen when learning rates are moderate, not maximal. When the learning rate is maximal, behavior is reduced to a win-stay lose-shift policy, where only the outcome of the previous trial is taken into account for choice. This actually results in a lower percent of the reward obtained. We have addressed the difference between the learning rate parameter in the reinforcement learning (RL) model and actual learning performance in the comment above. We believe that this new figure illustrates an essential point that different strategies could result in the same learning performance.

    This result shows that the male strategy was a valid one that doesn’t perform worse than the female strategy. Not only did they have identical performance (Figure 1C), but their optimized RL parameters put them both within the same predicted performance gradient in this new plot (Figure 3F). That’s exactly why we believe that it is important to understand differences in how individuals approach the same task, even as they may achieve the same overall levels of performance.

    New text from the manuscript (page 14, line 368-385):

    “While females had significantly higher learning rate (α) than males, they did not obtain more rewards than males. This is because the learning rate parameter in an RL model does not equate to the learning performance, which is better measured by the number of rewards obtained. The learning rate parameter reflects the rate of value updating from past outcomes. Performing well in this task requires both the ability to learn new information and the ability to hang onto the previously learned information. That occurs when the learning rate is moderate but not maximal. When the learning rate is maximal (α = 1), only the outcome of the immediate past trial is taken into account for the current choice. This essentially reduces the strategy to a win-stay lose-shift strategy, where choice is fully dependent on the previous outcome. A higher learning rate in a RL model does not translate to better reward acquisition performance. To illustrate that different combinations of learning rate and decision noise can result in the same reward acquisition performance. We conducted computer simulations of 10,000 RL agents defined by different combinations of learning rate (α) and inverse temperature (β) and plotted their reward acquisition performance for the restless bandit task (Figure 3F). This figure demonstrates that 1) different learning rate and inverse temperature combinations can result in similar performance, 2) the optimal reward acquisition is achieved when learning rate is moderate. This result suggested that not only did males and females had identical performance, their optimized RL parameters put them both within the same predicted performance gradient in this plot.”

    (d) From figure 3E, it looks like female rats learn better across days but male rats do not, but I am not sure. If you plot p(reward, obtained) vs times(days), do you see an improvement in female rats as opposed to males? Figure 4 also showed that females show more win-stay-lose-shift behavior and use past information more, both are indicators of better learning in this task.

    Taken the above together, I am not convinced about the strategic sex differences in exploration, it looks more like that the female rats simply learn better in this task.

    Unfortunately, there was no change in performance across days in either males or females. Per request by the reviewer, we now included a new plot illustrating p (reward,obtained) over days in Supplemental Figure 1. Ultimately, this resonated with the points we clarified above and demonstrated in this figure: males and females had identical performance in this task.

    To the other points raised here, about sex differences in win-stay lose-shift and mutual information: these are the strategic differences at the heart of the paper, but again did not alter overall performance for the reasons detailed above. Figure 4 did show that females were doing more win-stay. However, after further examining win-stay behavior by explore-exploit states, we found that females were only doing more win stay during exploratory trials (Figure 5E). There was no difference in win-stay during the exploitative trials. Figure 5F also demonstrated that females did more win-stay lose- shift in the exploration state, indicating that females only learned better during exploration. Although males learned slower during exploration, they compensated that by exploring for longer. Both male and female strategies are equally effective and may be differentially advantageous in different tasks.

    Finally, to address the meta-learning: in developing our response to this comment and looking for any other signs of adaptation across days (sex differenced or not), we did revisit this results and decided to rewrite some passages to be more circumscribed about our interpretations. Figure 3E showed increased learning rate parameters across days in females. We were initially excited about this idea of meta-learning, however we find no other evidence of adaptation over time in multiple behavioral measures, including reward acquisition, response time, and retrieval time (Supplemental Figure 1). Changes in learning rate parameters over sessions from the RL model were marginally significant and we feel that it’s worth mentioning for completeness, but it was only a small contributor to the overall sex differences in the behavioral profile. As a result we have toned down the conclusion we drew from this result accordingly.

    New text from the manuscript (page 4, line 93-113):

    “It is worth noting that unlike other versions of bandit tasks such as the reversal learning task, in the restless bandit task, animals were encouraged to continuously learn about the most rewarding choice(s). There is no asymptotic performance during the task because the reward probability of each choice constantly changes. The performance is best measured by the amount of obtained reward. Prior to data collection, both male and female mice had learned to perform this task in the touchscreen operant chamber. To examine whether mice had learned the task, we first calculated the average probability of reward acquisition across sessions in males and females (Supplemental Figure 1A). There was no significant changes in the reward acquisition performance across sessions in both sexes, demonstrating that both males and females have learned to perform the task and had reached an asymptotic level of performance across sessions (two-way repeated measure ANOVA, main effect of session, p = 0.71). Then we examine two other primary behavioral metrics across sessions that are associated with learning: response time and reward retrieval time (Supplemental Figure 1B, C). Response time was calculated as the time elapsed between the display onset and the time when the nose poke response was completed. Reward retrieval time was measured as the time elapsed between nose-poke response and magazine entry for reward collection. There was no significant change in response time (two-way repeated measure ANOVA, main effect of session, p = 0.39) and reward retrieval time (main effect of session, p = 0.71) across sessions in both sexes, which again demonstrated that both sexes have learned how to perform the task. Since both sexes have learned to perform the task prior to data collection, variabilities in task performance are results of how animals learned and adapted their choices in response to the changing reward contingencies.”

    page 14, line 386-390:

    “One interesting finding is that, when compared learning rate across sessions within sex, females, but not males, showed increased learning rate over experience with task (Figure 3G, repeated measures ANOVA, female: main effect of time, F (2.26,33.97) = 5.27, p = 0.008; male: main effect of time, F(2.5,37.52) = 0.23, p = 0.84). This points to potential sex differences in meta-learning that could contribute to the differential strategies across sexes.”

    1. I do like how the authors define exploration states vs exploitation states via HMM using choices alone. It would be interesting to see how the sex differences in reaction time are modulated by exploration vs exploitation state. As the authors showed, RT in exploration state is longer. Hence, it would make a conceptual difference whether the sex difference in reaction times is due to different proportions of time spent on exploration vs exploitation across sex.

    That is a very interesting idea. We tested for this possibility by calculating a two-way ANOVA (with interaction) between explore-exploit state and sex in predicting RT. There was a significant main effect of state (RT is longer in explore state than exploit state, main effect of state: F (1,30) = 13.07, p = 0.0011), but males were slower during females during both exploitation and exploration (main effect of sex, F(1,30) = 14.15, p = 0.0007) and there was no significant interaction (F (1,30) = 0.279, P = 0.60). Unfortunately, this means that we cannot interpret the response time difference between males and females as a consequence of the greater male tendency to explore. Response time is a fairly noisy primary behavior metric, especially in the males, and a lot of other factors might be at play here, some of which we plan to follow up on in the future. We report this result as follows (page 10, line 248-254):

    “Since males had more exploratory trials, which took longer, we tested the possibility that the sex difference in response time was due to prolonged exploration in male by calculating a two- way ANOVA between explore-exploit state and sex in predicting response time. There was a significant main effect of state (main effect of state: F (1,30) = 13.07, p = 0.0011), but males were slower during females during both exploitation and exploration (main effect of sex, F(1,30) = 14.15, p = 0.0007) and there was no significant interaction (F (1,30) = 0.279, P = 0.60).”

    Reviewer #3:

    In the manuscript 'Sex differences in learning from exploration', Chen and colleagues investigated sex differences in decision making behavior during a two-armed spatial restless bandit task. Sex differences and exploration dysregulation has been observed in various neuropsychiatric disorders. Yet, it has been unclear whether sex differences in exploration and exploitation contributes to sex-linked vulnerabilities in neuropsychiatric disorders.

    Chen and colleagues applied comprehensive modeling (model free Hidden Markov model (HMM), and various reinforcement learning (RL) models) and behavioral analysis (analysis of choice behavior using the latent variables extracted from HMM), to answer this question. They found that male mice explored more than female mice and were more likely to spend an extended period of their time exploring before committing to a favored choice. In contrast, female mice were more likely to show elevated learning during the exploratory period, making exploration more efficient and allowing them to start exploiting a favored choice earlier.

    Overall, I find the question studied in this work interesting, and compelling. Also, the results were convincing and the analysis through. However, assumptions in the proposed HMM is not fully justified and additional analyses are needed to strengthen authors' claims. To be more specific, the effect of obtained reward on state transitions, and biased exploitations should be further explored.

    Thank you for your feedback. We have included two more complex versions of the Hidden Markov models (HMMs) that account for the effect of obtained reward on state transitions and biased exploitations. Although the additional parameters slightly improve the model fit, model comparison tests suggested that such improvement was not significant. We decided to use the original HMM from the original manuscript because it’s the simplest and best fit model that provides the best parameter estimation with the amount of data we have. We do appreciate the comments and believe that the inclusion of two new HMMs and justification of the original HMM has strengthened our claims.

  2. Evaluation Summary:

    Chen et al. trained male and female animals on an explore/exploit (2-armed bandit) task. Despite similar levels of accuracy in these animals, authors report higher levels of exploration in male than in female mice. The patterns of exploration were analyzed in fine-grained detail, with the addition of computational modeling: males are less likely to stop exploring once exploring is initiated, whereas females stop exploring once they learn. The results are of broad interest to those interested is sex differences in learning. Inclusion of more primary behavioral data and further justification of the models and parameters is needed to clarify data presentation and interpretation.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    Chen et al. trained male and female animals on an explore/exploit (2-armed bandit) task. Despite similar levels of accuracy in these animals, authors report higher levels of exploration in males than in females. The patterns of exploration were analyzed in fine-grained detail: males are less likely to stop exploring once exploring is initiated, whereas female mice stop exploring once they learn. Authors find that both learning rate (alpha) and noise parameter (beta) increase in exploration trials in a hidden Markov model (HMM). When reinforcement learning (RL) models were fitted to animal data, they report females had a higher learning rate and over days of testing, suggesting higher meta-learning in females. They also report that of the RL models they fit, the model incorporating a choice kernel updating rule was found to fit both male and female learning. The results do suggest one should pay greater attention to the influence of sex in learning and exploration. Another important takeaway from this study is that similar levels of accuracy do not imply similar strategies. Essential revisions include a request to show more primary behavioral data, to provide a rationale for the different RL models and their parameters, to clarify the difference between learning and 'steady state,' and to qualify how these experiments uniquely identify latent cognitive variables not previously explored with similar methods.

  4. Reviewer #2 (Public Review):

    The authors investigated sex differences in explore-exploit tradeoff using a drifting binary bandit task in rodents. The authors tried to claim that males and females use different means to achieve similar levels of accuracy in making explore-exploit decisions. In particular, they argue that females explore less but learn more quickly during exploration. The topic is very interesting, but I am not yet convinced on the conclusions.

    Here are my major points:

    1. This paper showed that males explore more than females, and through computational modeling, they showed that females have a higher learning rate compared to males. The fact that males explore more and have lower learning rates compare to females, can be an interesting finding as the paper tried to claim, but it can also be that female rats simply learn the task better than male rats in the task used.

    (a) First, from Figure 1B, it looks like p(reward, chance) are similar between sex, but visually the female rats' performances, p(reward, obtained), look slight better than males. It would be nice if the authors could show a bar plot comparison like in Figure 1C and 1E. A non-significant test here only fails to show sex differences in performance, but it cannot be concluded that there are no sex differences in performance here. Further evidence needs to be reported here to help readers see whether there are qualitative differences in performances at all.

    (b) The exploration and exploitation states are defined by fitting a hidden Markov model. In the exploration phase, the agent chooses left and right randomly. From Figure 1E and 1F, it looks like for male rats, they choose completely randomly 70% of the times (around 50% for females). The exploration state here is confounded with the state of pure guessing (poor performance).

    (c) Figure 2 basically says that you can choose randomly for two reasons, to be more "noisy" in your decisions (have a higher temperature term), or to ignore the values more (by having a learning rate of 0, you are just guessing). It would be nice to show a simulation of p(reward, obtained) by learning rate x inverse temperature (like in Figure 2C). From Figure 2B, it looks like higher learning rates means better value learning in this task. It seems to me that it's more likely the male rats simply learn the task more poorly and behave more randomly which show up as more exploration in the HMM model.

    (d) From figure 3E, it looks like female rats learn better across days but male rats do not, but I am not sure. If you plot p(reward, obtained) vs times(days), do you see an improvement in female rats as opposed to males? Figure 4 also showed that females show more win-stay-lose-shift behavior and use past information more, both are indicators of better learning in this task.

    Taken the above together, I am not convinced about the strategic sex differences in exploration, it looks more like that the female rats simply learn better in this task.

    1. I do like how the authors define exploration states vs exploitation states via HMM using choices alone. It would be interesting to see how the sex differences in reaction time are modulated by exploration vs exploitation state. As the authors showed, RT in exploration state is longer. Hence, it would make a conceptual difference whether the sex difference in reaction times is due to different proportions of time spent on exploration vs exploitation across sex.
  5. Reviewer #3 (Public Review):

    In the manuscript 'Sex differences in learning from exploration', Chen and colleagues investigated sex differences in decision making behavior during a two-armed spatial restless bandit task. Sex differences and exploration dysregulation has been observed in various neuropsychiatric disorders. Yet, it has been unclear whether sex differences in exploration and exploitation contributes to sex-linked vulnerabilities in neuropsychiatric disorders.

    Chen and colleagues applied comprehensive modeling (model free Hidden Markov model (HMM), and various reinforcement learning (RL) models) and behavioral analysis (analysis of choice behavior using the latent variables extracted from HMM), to answer this question. They found that male mice explored more than female mice and were more likely to spend an extended period of their time exploring before committing to a favored choice. In contrast, female mice were more likely to show elevated learning during the exploratory period, making exploration more efficient and allowing them to start exploiting a favored choice earlier.

    Overall, I find the question studied in this work interesting, and compelling. Also, the results were convincing and the analysis through. However, assumptions in the proposed HMM is not fully justified and additional analyses are needed to strengthen authors' claims. To be more specific, the effect of obtained reward on state transitions, and biased exploitations should be further explored.