Modelling the neural code in large populations of correlated neurons

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper is of potential interest to neuroscientists interested in neural coding. It presents a novel family of statistical models that is more accurate than simple models that assume independence between neurons. The results provide evidence that the proposed encoding models accurately capture key statistics of realistic neural activity, and that Bayesian decoding based on them can be accurate and efficient. The manuscript would benefit from a more complete comparison with other models.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Neurons respond selectively to stimuli, and thereby define a code that associates stimuli with population response patterns. Certain correlations within population responses (noise correlations) significantly impact the information content of the code, especially in large populations. Understanding the neural code thus necessitates response models that quantify the coding properties of modelled populations, while fitting large-scale neural recordings and capturing noise correlations. In this paper, we propose a class of response model based on mixture models and exponential families. We show how to fit our models with expectation-maximization, and that they capture diverse variability and covariability in recordings of macaque primary visual cortex. We also show how they facilitate accurate Bayesian decoding, provide a closed-form expression for the Fisher information, and are compatible with theories of probabilistic population coding. Our framework could allow researchers to quantitatively validate the predictions of neural coding theories against both large-scale neural recordings and cognitive performance.

Article activity feed

  1. Author Response:

    Reviewer #1 (Public Review):

    Sokolsky et al. propose a new statistical model class for descriptive modeling of stimulus encoding in the spiking activity of neural populations. The main goals are to provide a model family that (G1) captures key activity statistics, such as spike count (noise) correlations, and their stimulus dependence, in potentially large neural populations, (G2) is relatively easy to fit, and (G3) when used as a forward encoder model for Bayesian decoders leads to efficient and accurate decoding. There are also three additional goals or claims: (C1) that this descriptive model family can serve to quantitatively test computational theories of probabilistic population coding against data, (C2) that the model can offer interpretable representations of information-limiting noise correlations, (C3) that the model can be extended to the case of temporal coding with dynamic stimuli and history dependence.

    The starting point of their model is a finite mixture of independent Poisson distributions, which is then generalized and extended in two ways. Due to the "mixture", the model can account for correlations between neurons (see G1). As any mixture model, the model can be viewed in the language of latent variables, which (in this case) are discrete categorical variables corresponding to different mixture components. The two extensions of the model are based on realizing that the joint distribution (of the observed spike counts and the latent variables) is in the exponential family (EF), which opens the door to powerful classical results to be applied (e.g. towards G2-G3), and allows for the two extensions by: (E1) generalizing Poisson distributions in mixture components to Conway-Maxwell-Poisson distributions, and (E2) introducing stimulus dependence by allowing the natural parameters of the EF to depend on stimulus conditions. They call the resulting model a Conditional Poisson Mixture or CPM (although the "Poisson" in CPM really means Conway-Maxwell-Poisson). E1 is key for capturing under-dispersion, i.e. Fano Factors below 1. For the case of discrete set of stimulus conditions, they propose minimal, maximal versions of E2; depending on which natural parameters are stimulus dependent. In the case of a continuum of stimuli (they only consider 1D continuum of stimulus orientations, e.g. in V1 encoding) they also consider a model-based parametric version of the minimal E2 which gives rise to Von Mises orientation tuning curves.

    Strengths:

    -Proposing a new descriptive encoding model of spike responses that can account for sub-poissonian and correlated noise structure, and yet can be tractably fit and accurately decoded.

    -Their experiments with simulated and real (macaque V1) data presented in Figs. 2-5 and Tables 1-2 provide good evidence that the model supports G1-3.

    -Working out a concrete Expectation Maximization algorithm that allows efficient fits of the model to data.

    -Exploiting the EP framework to provide a closed form expression for the model's Fisher Information for the minimal model class, a measure that plays a key role in theoretical studies of probabilistic population coding.

    As such, the papers makes a valuable contribution to the arsenal of descriptive models used to describe stimulus encoding in neural population, including the structure and stimulus dependence of their higher-order statistics.

    Thank you very much for your thorough, exact, and positive evaluation of our manuscript!

    Weaknesses:

    1. I found the title and abstract too vague, and not informative enough as to the concrete contributions of this paper. These parts should more concretely and clearly describe the proposed/developed model family and the particular contributions listed above.

    We found your summary of the paper and model to be highly accurate, and we rewrote the abstract to summarize the key strengths as you’ve listed them. We found it difficult to develop a more exact title which wasn’t overlong, so we left it as is.

    1. I was not convinced about claims C1 and C2 (which also contribute to the vagueness of abstract), but I think even without establishing these claims the more solid contributions of the paper are valuable. And while I can see how the model can be extended towards C3, there are no results pertaining to this in the current paper, nor even a concrete discussion of how the model may be extended in this direction.

    2.1) Regarding C1, the claim is supposed to follow from the fact that the model's joint distribution is in the exponential family (EF), and that they have reasonably shown G1-G3 (in particular, that it captures noise correlations and its Bayesian inversion provides an accurate decoder). While I agree with the latter part, what puzzles me is that in the probabilistic population coding (PPC) theoretical models that claim can be quantitatively tested using their descriptive model are, as far as I remember/understand, the encoder itself is in EF. By contrast here the encoder is a mixture of EF's and as such is not itself in EF. Perhaps this distinction is not key to the claim - but if so, this has to be clearly explained, and more generally the exact connection between the descriptive encoder model here and the models used in the PPC literature should be better elaborated.

    This claim was indeed poorly explained in our manuscript, and not self-evident. There is a deeper connection between our conditional models and PPCs, which we now make explicit in a new section of the manuscript (Constrained conditional mixtures support linear probabilistic population coding, line 364), which includes an equation (Equation 4) that shows their exact relationship.

    2.2) Regarding C2, I do not see how their results in Fig 5 (and corresponding section) provide any evidence for this claim. As a theoretical neuroscientist, I take "interpretable" to mean with a mechanistic or computational (theoretical) interpretation. But, if anything, I think the example studied in Fig 5 provides a great example of the general point: that even when successful descriptive models accurately capture the statistics of data, they may nevertheless not reveal (or even hide or mis-identify) the mechanisms underlying the data. In this example's ground-truth model, the stimulus (orientation) is first corrupted by input noise and then an independent population of neurons with homogeneous tuning curves (and orientation-independent average population rate) responds to this corrupted version of the stimulus. That is a very simple AND mechanistic interpretation (which of course is not manifest to someonw only observing the raw stimulus and spiking data). The fit CPM, on the other hand, does not reveal the continuous input noise mechanism (and homogeneous population response) directly, but instead captures the resulting noise correlation structure by inferring a large (~20) number of mixture components, in each of which population response prefers a certain orientation. For a given stimulus orientation, the fluctuations between (3-4 relevant) mixture components then approximate the effect of input noise. This captures the generated data well, but misses the true mechanism and its simpler interpretation. Let me be clear that I don't take this as a fault of their descriptive model. This is a general phenomenon, despite which their descriptive model, like any expressive and tractible descriptive model, still can be a powerful tool for neural data analysis. I'm just not convinced about the claim.

    This is a very fair point, and we’ve reformulated a few passages to emphasize that the model is primarily descriptive, at least in our applications in the paper (see new section title at like 393, the first corresponding paragraph).

    2.3) Regarding C3, I think the authors can at least add a discussion of how the model can be extended in this direction (and as I'm sure they are aware, this can be done by generalizing the Von Mises version of the model, whereby the model I believe can be more generally thought of as a finite mixture of GLMs).

    In Appendix 4 we detail the relationship between CPMs and GLMs. We also note here that, at least as far as we understand, CPMs are formally distinct from finite mixtures of GLMs — the easiest way to see this distinction is to note that the index probabilities of a CPM depend on the stimulus, whereas the equivalent index probabilities in a finite mixture of GLMs would not. We have also explained this in Appendix 4.

    Reviewer #2 (Public Review):

    Sokoloski, Aschner, and Coen-Cagli present a modeling approach for the joint activity of groups of neurons using a family of exponential models. The Conway-Maxwell (CoM) Poisson models extend the "standard" Poisson models, by incorporating dependencies between neurons.

    They show the CoM models and their ability to capture mixture of Poisson distributions. Applied to V1 data from awake and anesthetized monkeys, they show it captures the Fano Factor values better than simple Poisson models, compare spike count variability and co-variability. Log-likelihood ratios in Table 1 show on-par or better performance of different variant of the CoM models, and the optimal number of parameters to use for maximizing the likelihood [balancing accuracy and overfitting] and are useful for decoding. Finally, they show how the latent variables of the model can help interpret the structure of population codes using simple simulated Poisson models over 200 neurons.

    In summary, this new family of models offer a more accurate approach to the modeling and study of large populations, and so reflects the limited value of simple Poisson based models. Under some conditions it gives has higher likelihood than Poisson models and uses fewer parameters than ANN model.

    However, the approach, presentation, and conclusions fall short on several issues that prevents a clear evaluation of the accuracy or benefits of this family of models. Key of them is the missing comparison to other statistical models.

    1. Critically, the model is not evaluated against other commonly used models of the joint spiking patterns of large populations of neurons. For example: GLMs (e.g. Pillow et al Nature 2008), latent Gaussian models (e.g. Macke et al Neural Comp 2009), Restricted Boltzmann Machines (e.g. Gardella et al PNAS 2018), Ising models for large groups of neurons (e.g. Tkacik etal PNAS 2015, Meshulam et al Neuron 2017), and extensions to higher order terms (Tkacik et al J Stat Mech 2013), coarse grained versions (Meshulam et al Phys Rev Lett 2019), or Random Projections models (Maoz et al biorxiv 2018).

    . Most of these models have been used to model comparable or even larger populations than the ones studied here, often with very high accuracy, measured by different statistics of the populations and detailed spiking patterns (see more below). Much of the benefit or usefulness of the new family of models hinges on its performance compared to these other models.

    We agree very much with this point, and have done our best to address it by thoroughly comparing our model with a factor analysis encoding model in Appendices 1 and 2, and summarizing these results at appropriate points in the manuscript (lines 196–199 and 325–328). In particular, we visualized and compared the performance of factor analysis with our mixture models, and found that (i) factor analysis is better at capturing the first and second order statistics of the data, but (ii) when evaluated on held-out data, the performance gap more-or-less vanishes. Moreover, we found that an encoding model based on FA performs poorly as a Bayesian decoder, and we provided preliminary evidence that this is because our mixture models can capture higher-order statistics that FA cannot. We believe that these results have been very valuable to conveying the strengths and weaknesses of the mixture model approach.

    We have also extended the introduction to explain the differences between other model families suggested by the reviewer and our approach, to explain how the different assumptions about the form of data make it difficult to compare them quantitatively (see lines 42–63). To wit, GLMs and latent Gaussian models are both models that critically depend on modelling spike trains, and not spike counts. On the other hand, Restricted Boltzmann machines, Ising models, and random projection models all assume binary, rather than counting spiking data. As such, any comparison would depend on coming up with methods for either (i) reshaping our datasets and comparing spike- train/binary spike-count likelihoods to trial-to-trial likelihoods, or (ii) extending our conditional mixture approach to temporal/binary data, both of which are beyond the scope of our paper. We instead used factor analysis because it has been applied widely to modelling trial-to-trial spike counts, and thus avoid further transformations that might reduce the validity of our comparisons.

    1. As some of these models are exponential models, their relations to the family of the models suggested by the authors is relevant also in terms of the learned latent variables. Moreover, the number of parameters that are needed for these different models should be compared to the CoM and its variants.

    In our comparisons with factor analysis we also compared number of latent states/dimensions required to achieve maximum performance. Overall FA was consistently the most efficient, at least when evaluated on the ability to capture second-order statistics, although our mixture models also performed quite well with modest numbers of parameters.

    1. The analysis focuses on simple statistics of neural activity, like Fano Factors (Fig. 2) and visual comparisons rather than clear quantitative ones. More direct assessments of performance in terms of other spiking statistics for single neurons and small groups (e.g., correlations of different orders ) and direct comparison to individual spiking patterns (which would be practical for groups of up to 20 neurons) would be valuable

    In the Appendix 2 we evaluated the ability of our mixtures to capture the empirical skewness and kurtosis of recorded neurons, and found that the CoM-based mixture performs quite well (r2 for the CoM-Based mixture was between 0.6 and 0.9). Because FA cannot capture these higher-order moments, we speculate that modelling these higher-order moments is critical for maximizing decoding performance. This adds another perspective on the strengths of our approach, and we appreciate the suggestion.

    Reviewer #3 (Public Review):

    The authors use multivariate mixtures of Poisson or Conway-Maxwell-Poisson distributions to model neural population activity. They derive an EM algorithm, a formula for Fisher information, and a Bayesian decoder for such models, and show it is competitive with other methods such as ANNs. The paper is clear and didactically written, and I learned a lot from reading it. Other than a few typos the math and analyses appear to be correct.

    Thank you for the positive evaluation!

    Nevertheless there are some ways the study could be further improved.

    Most important, code for performing these analyses needs to be publicly released. The EM algorithm is complicated, involving a gradient optimization on each iteration - it is very unlikely people will rewrite this themselves, so unless the authors release well-packaged and well-documented code, their impact will be limited.

    We very much agree, and we have done this. We provide a link to our gitlab page, where all relevant code can be downloaded, and installation instructions are provided (we indicate this in the manuscript at lines 799–803).

    Second, it would be nice to extend the model to continuous latent factors. It seems likely that one or two latent factors could do the work of many mixture components, as well as increasing the interpretability of the models.

    We certainly agree that in some cases continuous latent variables could be much more parsi- monious. However, to the best of our knowledge most of the expressions that we rely on would no longer be closed-form, and so the machinery of the model would require suitable approximations. Nevertheless, it’s an interesting possibility that we now address in the Discussion (lines 482–491).

    Third, it would be interesting to see the models applied to more diverse types of population data (for example hippocampal place field recordings).

    We certainly agree with the importance of applying our model to other datasets, and indeed the purpose of our manuscript is to offer a method that can be applied broadly, and our goal in making the code available publicly is to facilitate that. However, we have decided to maintain the focus of this manuscript on the method itself, and limit the application to one kind of data (V1), for which we also now provide more extensive analysis and quantification of the response statistics (Figure 2 C-D, Figure 3 G-H, Appendix 2), a study of the sample sizes required to fit the model (Appendix 3), and model-comparison (Appendix 1–2). Overall we feel that the paper is already quite long and dense even when limited to a single kind of data. We believe applications to multiple kinds of data would perhaps be better suited for a different study, focusing on the comparisons between them. In that regard, we are certainly open to future collaborations on large-scale recordings from various stimulus-driven brain areas.

    Fourth, how does a user choose how many mixture components to add?

    To clarify this, we’ve added a section in the methods (Strategies for choosing the CM form and latent structure), and in particular the number of mixture components.

  2. Evaluation Summary:

    This paper is of potential interest to neuroscientists interested in neural coding. It presents a novel family of statistical models that is more accurate than simple models that assume independence between neurons. The results provide evidence that the proposed encoding models accurately capture key statistics of realistic neural activity, and that Bayesian decoding based on them can be accurate and efficient. The manuscript would benefit from a more complete comparison with other models.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    Sokolsky et al. propose a new statistical model class for descriptive modeling of stimulus encoding in the spiking activity of neural populations. The main goals are to provide a model family that (G1) captures key activity statistics, such as spike count (noise) correlations, and their stimulus dependence, in potentially large neural populations, (G2) is relatively easy to fit, and (G3) when used as a forward encoder model for Bayesian decoders leads to efficient and accurate decoding. There are also three additional goals or claims: (C1) that this descriptive model family can serve to quantitatively test computational theories of probabilistic population coding against data, (C2) that the model can offer interpretable representations of information-limiting noise correlations, (C3) that the model can be extended to the case of temporal coding with dynamic stimuli and history dependence.

    The starting point of their model is a finite mixture of independent Poisson distributions, which is then generalized and extended in two ways. Due to the "mixture", the model can account for correlations between neurons (see G1). As any mixture model, the model can be viewed in the language of latent variables, which (in this case) are discrete categorical variables corresponding to different mixture components. The two extensions of the model are based on realizing that the joint distribution (of the observed spike counts and the latent variables) is in the exponential family (EF), which opens the door to powerful classical results to be applied (e.g. towards G2-G3), and allows for the two extensions by: (E1) generalizing Poisson distributions in mixture components to Conway-Maxwell-Poisson distributions, and (E2) introducing stimulus dependence by allowing the natural parameters of the EF to depend on stimulus conditions. They call the resulting model a Conditional Poisson Mixture or CPM (although the "Poisson" in CPM really means Conway-Maxwell-Poisson). E1 is key for capturing under-dispersion, i.e. Fano Factors below 1. For the case of discrete set of stimulus conditions, they propose minimal, maximal versions of E2; depending on which natural parameters are stimulus dependent. In the case of a continuum of stimuli (they only consider 1D continuum of stimulus orientations, e.g. in V1 encoding) they also consider a model-based parametric version of the minimal E2 which gives rise to Von Mises orientation tuning curves.

    Strengths:

    - Proposing a new descriptive encoding model of spike responses that can account for sub-poissonian and correlated noise structure, and yet can be tractably fit and accurately decoded.

    - Their experiments with simulated and real (macaque V1) data presented in Figs. 2-5 and Tables 1-2 provide good evidence that the model supports G1-3.

    - Working out a concrete Expectation Maximization algorithm that allows efficient fits of the model to data.

    - Exploiting the EP framework to provide a closed form expression for the model's Fisher Information for the minimal model class, a measure that plays a key role in theoretical studies of probabilistic population coding.

    As such, the papers makes a valuable contribution to the arsenal of descriptive models used to describe stimulus encoding in neural population, including the structure and stimulus dependence of their higher-order statistics.

    Weaknesses:

    1. I found the title and abstract too vague, and not informative enough as to the concrete contributions of this paper. These parts should more concretely and clearly describe the proposed/developed model family and the particular contributions listed above.

    2. I was not convinced about claims C1 and C2 (which also contribute to the vagueness of abstract), but I think even without establishing these claims the more solid contributions of the paper are valuable. And while I can see how the model can be extended towards C3, there are no results pertaining to this in the current paper, nor even a concrete discussion of how the model may be extended in this direction.

    2.1) Regarding C1, the claim is supposed to follow from the fact that the model's joint distribution is in the exponential family (EF), and that they have reasonably shown G1-G3 (in particular, that it captures noise correlations and its Bayesian inversion provides an accurate decoder). While I agree with the latter part, what puzzles me is that in the probabilistic population coding (PPC) theoretical models that claim can be quantitatively tested using their descriptive model are, as far as I remember/understand, the encoder itself is in EF. By contrast here the encoder is a mixture of EF's and as such is not itself in EF. Perhaps this distinction is not key to the claim - but if so, this has to be clearly explained, and more generally the exact connection between the descriptive encoder model here and the models used in the PPC literature should be better elaborated.

    2.2) Regarding C2, I do not see how their results in Fig 5 (and corresponding section) provide any evidence for this claim. As a theoretical neuroscientist, I take "interpretable" to mean with a mechanistic or computational (theoretical) interpretation. But, if anything, I think the example studied in Fig 5 provides a great example of the general point: that even when successful descriptive models accurately capture the statistics of data, they may nevertheless not reveal (or even hide or mis-identify) the mechanisms underlying the data. In this example's ground-truth model, the stimulus (orientation) is first corrupted by input noise and then an independent population of neurons with homogeneous tuning curves (and orientation-independent average population rate) responds to this corrupted version of the stimulus. That is a very simple AND mechanistic interpretation (which of course is not manifest to someonw only observing the raw stimulus and spiking data). The fit CPM, on the other hand, does not reveal the continuous input noise mechanism (and homogeneous population response) directly, but instead captures the resulting noise correlation structure by inferring a large (~20) number of mixture components, in each of which population response prefers a certain orientation. For a given stimulus orientation, the fluctuations between (3-4 relevant) mixture components then approximate the effect of input noise. This captures the generated data well, but misses the true mechanism and its simpler interpretation. Let me be clear that I don't take this as a fault of their descriptive model. This is a general phenomenon, despite which their descriptive model, like any expressive and tractible descriptive model, still can be a powerful tool for neural data analysis. I'm just not convinced about the claim.

    2.3) Regarding C3, I think the authors can at least add a discussion of how the model can be extended in this direction (and as I'm sure they are aware, this can be done by generalizing the Von Mises version of the model, whereby the model I believe can be more generally thought of as a finite mixture of GLMs).

  4. Reviewer #2 (Public Review):

    Sokoloski, Aschner, and Coen-Cagli present a modeling approach for the joint activity of groups of neurons using a family of exponential models. The Conway-Maxwell (CoM) Poisson models extend the "standard" Poisson models, by incorporating dependencies between neurons.

    They show the CoM models and their ability to capture mixture of Poisson distributions. Applied to V1 data from awake and anesthetized monkeys, they show it captures the Fano Factor values better than simple Poisson models, compare spike count variability and co-variability. Log-likelihood ratios in Table 1 show on-par or better performance of different variant of the CoM models, and the optimal number of parameters to use for maximizing the likelihood [balancing accuracy and overfitting] and are useful for decoding. Finally, they show how the latent variables of the model can help interpret the structure of population codes using simple simulated Poisson models over 200 neurons.

    In summary, this new family of models offer a more accurate approach to the modeling and study of large populations, and so reflects the limited value of simple Poisson based models. Under some conditions it gives has higher likelihood than Poisson models and uses fewer parameters than ANN model.

    However, the approach, presentation, and conclusions fall short on several issues that prevents a clear evaluation of the accuracy or benefits of this family of models. Key of them is the missing comparison to other statistical models.

    1. Critically, the model is not evaluated against other commonly used models of the joint spiking patterns of large populations of neurons. For example: GLMs (e.g. Pillow et al Nature 2008), latent Gaussian models (e.g. Macke et al Neural Comp 2009), Restricted Boltzmann Machines (e.g. Gardella et al PNAS 2018), Ising models for large groups of neurons (e.g. Tkacik etal PNAS 2015, Meshulam et al Neuron 2017), and extensions to higher order terms (Tkacik et al J Stat Mech 2013), coarse grained versions (Meshulam et al Phys Rev Lett 2019), or Random Projections models (Maoz et al biorxiv 2018).

    Most of these models have been used to model comparable or even larger populations than the ones studied here, often with very high accuracy, measured by different statistics of the populations and detailed spiking patterns (see more below). Much of the benefit or usefulness of the new family of models hinges on its performance compared to these other models.

    1. As some of these models are exponential models, their relations to the family of the models suggested by the authors is relevant also in terms of the learned latent variables. Moreover, the number of parameters that are needed for these different models should be compared to the CoM and its variants.

    2. The analysis focuses on simple statistics of neural activity, like Fano Factors (Fig. 2) and visual comparisons rather than clear quantitative ones. More direct assessments of performance in terms of other spiking statistics for single neurons and small groups (e.g., correlations of different orders ) and direct comparison to individual spiking patterns (which would be practical for groups of up to 20 neurons) would be valuable

  5. Reviewer #3 (Public Review):

    The authors use multivariate mixtures of Poisson or Conway-Maxwell-Poisson distributions to model neural population activity. They derive an EM algorithm, a formula for Fisher information, and a Bayesian decoder for such models, and show it is competitive with other methods such as ANNs. The paper is clear and didactically written, and I learned a lot from reading it. Other than a few typos the math and analyses appear to be correct. Nevertheless there are some ways the study could be further improved.

    Most important, code for performing these analyses needs to be publicly released. The EM algorithm is complicated, involving a gradient optimization on each iteration - it is very unlikely people will rewrite this themselves, so unless the authors release well-packaged and well-documented code, their impact will be limited.

    Second, it would be nice to extend the model to continuous latent factors. It seems likely that one or two latent factors could do the work of many mixture components, as well as increasing the interpretability of the models.

    Third, it would be interesting to see the models applied to more diverse types of population data (for example hippocampal place field recordings).

    Fourth, how does a user choose how many mixture components to add?