Learning from invariants predicts upcoming behavioral choice from spiking activity in monkey V1
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
Animals frequently make decisions based on sensory cues. In such a setting, the overlap in the information on the stimulus and on the choice is crucial for the formation of informed behavioral decisions. Yet, how the information on the stimulus and on the choice interact in the brain is poorly understood. Here, we study the representation of a binary decision variable in the primary visual cortex (V1) while macaque monkeys perform delayed match-to-sample task on naturalistic visual stimuli close to psychophysical threshold. Using population vectors, we demonstrate the overlap in decoding spaces on binary stimulus classes “match/non-match” and binary choices “same /different” of the animal. Leveraging this overlap, we use learning from the invariant information across the two classification problems to predict the choice of the animal as a time-dependent population signal. We show the importance of the across-neuron organization and the temporal structure of spike trains for the decision signal and suggest how noise correlations between neurons with similar decoding selectivity are helpful for the accumulation of the decision signal. Finally, we show that decision signal is primarily carried by bursting neurons in the superficial layers of the cortex.
Author summary
V1 is necessary for normal visual processing and is known to process features of visual stimuli such as orientation, but whether V1 also encodes behavioral decisions is an unresolved issue, with conflicting evidence. Here, we demonstrate that V1 encodes a mixed variable that contains the information about the stimulus as well as about the choice. We learn the structure of population responses in trials pertaining to the variable “stimulus+choice”, and apply the resulting population vectors to trials that differ only about the choice of the animal, but not about the stimulus class. Moreover, we learn structure of population responses on time-averaged data and then apply it on time-dependent (spiking) data. During the late phase of the trial, this procedure allows to predict the upcoming choice of the animal with a time-dependent population signal. The spiking signal of small neural population is sparse, and we hypothesize that positive correlations between neurons in the same decoding pool help the transmission of the decision-related information downstream. We find that noise correlations in the same decoding pool are significantly stronger than across coding pools, which corroborates our hypothesis on the benefit of noise correlations for the read-out of a time-dependent population signal.
Article activity feed
-
###Reviewer #3:
The authors ask whether and how information about an upcoming choice is encoded by neuronal activities in V1. To address this question, they recorded from multiple neurons in V1 simultaneously, while monkeys performed a delayed orientation-match-to-sample task. They then asked whether and how they could decode the stimulus presented to the animal, and/or the upcoming behavioral report of their decision (choice), from these V1 recordings. They found that the combination stimulus+choice could be decoded, and that bursty neurons were most likely to affect the decoded choice. Moreover, neurons in the superficial cortical layer also appeared to have a stronger choice signal. This suggests that the choice signal may arise outside of V1, but nevertheless be reflected by spiking activity within V1.
This study addresses an …
###Reviewer #3:
The authors ask whether and how information about an upcoming choice is encoded by neuronal activities in V1. To address this question, they recorded from multiple neurons in V1 simultaneously, while monkeys performed a delayed orientation-match-to-sample task. They then asked whether and how they could decode the stimulus presented to the animal, and/or the upcoming behavioral report of their decision (choice), from these V1 recordings. They found that the combination stimulus+choice could be decoded, and that bursty neurons were most likely to affect the decoded choice. Moreover, neurons in the superficial cortical layer also appeared to have a stronger choice signal. This suggests that the choice signal may arise outside of V1, but nevertheless be reflected by spiking activity within V1.
This study addresses an interesting and potentially important question: where do choice signals arise in the brain, and how do V1 activities relate to those choice signals? At the same time, I was quite confused about a lot of the data presented and overall remain somewhat unconvinced. My specific critiques are as follows:
In Fig. 1BC: what are these population vectors? In the case of "C", I assume these are the SVM weights that are used to discriminate between choices, and the data for each choice are pooled over both stimulus types (match or non-match). But for "S+C", I don't quite follow what is going on. Is it the case that you do the decoding just on the "correct" trials (as suggested in Table 1)? This critique should highlight the fact that I failed to understand your main point, about decoding C vs "S+C". Much more writing clarity throughout the paper would help with this, and make it possible for me to evaluate the paper's main claims.
Fig. 1D is claimed to tell us how neurons respond differently under different conditions, but it does not do that. It tells us how SVM decoders weight those neurons differently under different conditions. Moreover the result seems kind of trivial: it shows that "strong weights change more" between conditions. That's not very surprising: you are subtracting bigger numbers when there are stronger weights, so the differences will be larger. Is there more going on here?
In Fig. 2: what time intervals were the spikes summed for the decoding? There are some values given for different window lengths, but when did those windows start? Was it at the start of the "test" image presentation? Or some other time?
It seems like movement is a confound. The claim is that choice is represented in V1. But we know from recent work by Stringer et al. (Science 2019), that movement profoundly affects V1 spiking. So if any movement signals precede the behavioural report, those will correlate with choice and be reflected by V1 spiking. In that case, is it really fair to say that V1 encodes choice? Or, rather, that the pre-report motion of the animal is encoded in V1?
I couldn't find strong support for the claim that decoding is better when using superficial neurons vs. deeper ones. A panel like Fig. 7E (which does this for bursty vs non-bursty neurons) but comparing the different layers would help with this. I realize this result is somewhat implied by the differences in bursty neuron fraction across layers (which is shown), but this claim is central and so should be explicitly tested.
I have concerns about a lot of the statistical tests used in this paper. For example:
a) Fig. 2D. Should do a permutation test, to randomly assign neurons to "big" vs "small" weight categories, then redo the analysis. That will get p-value much more reliably than the t-test, which assumes (incorrectly that data are Gaussian). Another big issue is that the selection of small vs big can have some biasing effects, so the t-test between the two groups could way overemphasize significance. A permutation test is harder to fool in this way.
b) Fig 3D statistical test compares the analysis of data with optimized weights to a case of random weights and random permutation. That's not quite fair because you optimize the weights for the real data but not for the null hypothesis you are testing. A better test would be to do random permutations of the data, then train the weights on each random permutation and test on held-out data from that random permutation. It will likely yield similar results to what you've got, but be a more compelling test in my opinion.
c) Fig. 6B: not sure t-test is right. Are these data Gaussian?
- The results in Fig. 9BC seem interesting, but it's hard to parse the network diagrams. Showing 3x3 matrices for the CCM coefficients from neurons each layer to ones in each other layer would help me to evaluate the claim that the superficial layer acts as a hub.
-
###Reviewer #2:
Here the authors present results examining the possibility of decoding a choice signal from V1. They show that a transfer learning approach that mixes stimulus and choice during training provides information about choice that is slightly better than chance. In contrast, decoding choice directly using a linear SVM results in chance decoding. They then examine potential time-varying structure in the "choice signal" and nicely show that the strongest contributions are from bursting neurons in the superficial layers of V1.
This is a novel approach to an interesting open problem in systems neuroscience. However, based on my understanding, there are several core issues that need to be addressed.
Major Issues:
- I may have misunderstood, but it is not obvious to me that the "choice signal" that the authors report is a signature …
###Reviewer #2:
Here the authors present results examining the possibility of decoding a choice signal from V1. They show that a transfer learning approach that mixes stimulus and choice during training provides information about choice that is slightly better than chance. In contrast, decoding choice directly using a linear SVM results in chance decoding. They then examine potential time-varying structure in the "choice signal" and nicely show that the strongest contributions are from bursting neurons in the superficial layers of V1.
This is a novel approach to an interesting open problem in systems neuroscience. However, based on my understanding, there are several core issues that need to be addressed.
Major Issues:
- I may have misunderstood, but it is not obvious to me that the "choice signal" that the authors report is a signature of choice and not just a stimulus-driven effect. From what I understand the same image was used during an entire recording session, and the difference between target and test is either 0deg (match) or 3-10deg (nonmatch). A decoder is trained to classify the test orientation (using the correct trials only). Then choice prediction accuracy and "choice signals" are assessed using the nonmatch trials. In this setting, it seems that if there is some tuning to the stimulus orientation and some variability in the responses that eventually influences the choice then you would see a difference in the choice signal as calculated here.
If the "choice signal" calculated here is present for the same/different responses under the match condition I would be more convinced that this is, in some sense, a representation of choice. The authors mention there were few trials in the IM condition, but it seems valuable to show. Alternatively, and I understand it may not be feasible at this stage, I would also be more convinced if the authors got similar results when the stimulus image varied from trial to trial within a recording session. Barring that, I have trouble seeing how this is a "representation" of choice, except under an extremely loose definition of "representation".
Unless I've misunderstood something fundamental (which is possible), it seems better to frame these results as "evidence that choice can be decoded from V1 activity at slightly better than chance in this particular task" rather than "a time-resolved code that reflects the instantaneous computation of the low-dimensional choice variable in animal's brain...[that] contributes to animal's behavior as it unfolds" (as stated in the introduction).
If I have misunderstood maybe the authors can clarify where I went wrong and/or show results from simulations to help me understand why the "choice signal" here is distinct from a situation where you just have purely feedforward effects with noisy sensory encoding in V1 and downstream decision making in a different brain area.
It is also not clear to me why the "zero crossing" is the relevant time point to consider when looking at the timing of the choice signal. The point where the choice signal is farthest from zero seems much more relevant and seems to occur very close to the point where firing rates are the highest. Some clarification on this issue would be helpful. Additionally, it could be worthwhile to test what happens when the data are not z-scored. This seems like it may get rid of the zero crossing altogether. I'm somewhat surprised that there is a difference in the same/different responses after 200ms, but the fact that similar differences appear at <50ms might point to a normalization issue.
I'm also concerned about the interpretation of the "plus" and "minus" and "strong" and "weak" subnetworks. It is not obvious to me whether the decoding weights will be stable. Particularly when decoding from small populations, the weights could be influenced by overfitting and omitted variables. This is a relatively minor concern compared to the above issues, but it could be helpful to explicitly measure how stable the weights are. The authors could show weights from the 1st half and 2nd half of the data or see if the weights change when decoding based on subsets of the observed neurons.
-
###Reviewer #1:
This article asks the question as to whether V1 encodes a behavioral choice variable using visual information. The authors propose an approach, termed generalized learning, to predict the choice variable using a time-resolved code computing from V1 population spiking, in an experiment that utilizes naturalistic stimuli.
More specifically, the authors build a decoder to predict the stimulus + choice (S+C) variable, and then utilize it to predict the choice variable. Using this approach, the authors report that population activity can predict the choice variable, relying on the overlap b/w the representation of the stimulus and the choice.
In addition, the authors identify/study the role of different sub-populations of neurons in enabling the prediction of the choice variable. The authors report that the accumulation of a …
###Reviewer #1:
This article asks the question as to whether V1 encodes a behavioral choice variable using visual information. The authors propose an approach, termed generalized learning, to predict the choice variable using a time-resolved code computing from V1 population spiking, in an experiment that utilizes naturalistic stimuli.
More specifically, the authors build a decoder to predict the stimulus + choice (S+C) variable, and then utilize it to predict the choice variable. Using this approach, the authors report that population activity can predict the choice variable, relying on the overlap b/w the representation of the stimulus and the choice.
In addition, the authors identify/study the role of different sub-populations of neurons in enabling the prediction of the choice variable. The authors report that the accumulation of a choice signal at the input of a hypothetical read-out neuron facilitates the prediction of choice from V1 population activity. The authors also report that burstiness represents a useful feature of neurons, which facilitates the accumulation of the choice signal.
Finally, using an analysis of the intrinsic flow of V1 information with three sub-populations of neurons, the authors report that information about the choice in V1 likely comes from top-down processing.
Major comments:
- In Fig. 2b, I find it difficult to assess how significantly different from chance the S+C decoder performs, compared to the choice only decoder. The authors report data from 20 sessions in Fig. 2 a. It seems to me that if the authors were to use the balanced accuracy (BAC) from these 20 sessions to build an empirical distribution of BAC across the sessions, the 95% confidence region would overlap with 0.5 (chance). Does that sound accurate to the authors?
The authors do report that they've tested for the significance of the difference in the similarity vectors, and call them "weakly" similar.
Put more simply, my comment relates to the following, more basic, question: how does one interpret a BAC of 0.55 vs 0.5, in terms of how much overlap this means in the shared representation between stimulus and choice? What if the BAC had been 0.7 for S+C vs 0.5 for C? Do the authors think it possible to make more precise statements about the shared representation?
Similarly, how does one interpret different degrees of similarity? I understand the interpretation of the angle b/w the two vectors, and that at one extreme lies orthogonality and at the other co-linearity. Can one interpret the cosine of the difference in the angles as an amount of shared representation?
I think that this represents a point that the authors should expand upon, discuss more thoroughly in the manuscript, namely can we really make a statement about how much the representations of stimulus and choice overlap?
- The authors S+C analysis relies heavily on the data collected when the animal chooses correctly. As far I understand, the authors suggest that the incorrect trials add "noise". I find this difficult to understand. Have the authors performed the S+C analysis when the animal chooses incorrectly? I could not understand clearly a) why restricting oneself to correct trials seems crucial, and b) the significance of this from the perspective of the representation of choice in the circuit.
A true decoder of S+C would have 4 possible outcomes (two that the authors already consider, and two additional ones coming from incorrect trials). The authors focus on two of these. To me, this deserves a detailed discussion.
I suggest that, very early on in the article, the authors make it clear that the S+C decoder conditions on correct choice, and a) why restricting oneself to correct trials seems crucial, and b) the significance of this from the perspective of the representation of choice in the circuit.
Why do random weights (fig 4a, top right) work well? i.e. the figure looks very similar to (fig 3c). As far as I understand, the random weights come from the empirical distribution of the weights (fig 6a). This seems agnostic to the layer to which a cell belongs. How do I reconcile the authors’ statements about the importance of certain groups of cells to predicting the choice variable?
The authors use different feature extraction for training and testing. The authors train on spike counts (features) and test on binary spiking activity smoothed using a first-order filter (exponential impulse response). One reason I think this might be problematic goes as follows: during training, the authors get a prediction from the SVM for a whole time segment. I have no problem with this. For testing, however, the authors get a prediction for every 1ms bin. How does one translate that into a prediction of choice for the whole window?
I can understand the argument that testing on a different data set represents a form of transfer learning. My reservation comes from the apparent lack of a prediction on the test set, and accuracies on the test data.
As they stand, I find the authors’ statement about the differences in the choice signal/zero crossings etc very qualitative. It would be nice to report training and test accuracy, as standard in ML.
-
##Preprint Review
This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.
###Summary:
Overall, as a group, the reviewers expressed excitement about the topic and questions posed in the paper. At the same time, the reviewers did not think that the data and results of analyses the authors report provide enough evidence here to justify the claim of having found a "representation of "choice" in V1. The following represent two critiques that the reviews have in common (Please refer to the individual reviews for details):
The fact that the authors restrict themselves to "correct only" trials to …
##Preprint Review
This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.
###Summary:
Overall, as a group, the reviewers expressed excitement about the topic and questions posed in the paper. At the same time, the reviewers did not think that the data and results of analyses the authors report provide enough evidence here to justify the claim of having found a "representation of "choice" in V1. The following represent two critiques that the reviews have in common (Please refer to the individual reviews for details):
The fact that the authors restrict themselves to "correct only" trials to claim that V1 encodes choice raised eyebrows.
The manner in which the authors conducted the computational and statistical analysis also raised a number of questions/concerns.
-