Population encoding of stimulus features along the visual hierarchy

Abstract

The retina and primary visual cortex (V1) both exhibit diverse neural populations sensitive to diverse visual features. Yet it remains unclear how neural populations in each area partition stimulus space to span these features. One possibility is that neural populations are organized into discrete groups of neurons, with each group signaling a particular constellation of features. Alternatively, neurons could be continuously distributed across feature-encoding space. To distinguish these possibilities, we presented a battery of visual stimuli to the mouse retina and V1 while measuring neural responses with multi-electrode arrays. Using machine learning approaches, we developed a manifold embedding technique that captures how neural populations partition feature space and how visual responses correlate with physiological and anatomical properties of individual neurons. We show that retinal populations discretely encode features, while V1 populations provide a more continuous representation. Applying the same analysis approach to convolutional neural networks that model visual processing, we demonstrate that they partition features much more similarly to the retina, indicating they are more like big retinas than little brains.

This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/8199502.

This review reflects comments and contributions by Ryan Cubero and one other crowd reviewer who opted to remain anonymous. Review synthesized by Ryan Cubero.

Brief summary of the study:

In this paper, Dyballa et al. investigated the neural representations of retinal ganglion cells (RGCs) and primary visual cortex (V1) neurons to sinusoidal drifting gratings and optical flow stimuli. Using non-negative tensor decomposition, the authors introduced an "encoding manifold", a low-dimensional representation that organizes neurons based on how they respond to different features of the stimulus ensemble, differently from traditional decoding manifolds that are stimulus-based. Their analyses revealed that RGCs form distinct clusters with similar response properties, while V1 neurons are distributed in a continuous manifold. In order to explore the similarities between the way convolutional neural networks (CNNs) and populations of neurons encode visual features, the same encoding manifolds were generated from CNNs trained to classify images. The authors found that their organization was quite distinct from V1 and even more clustered than the retina, suggesting a limitation of CNNs in simulating how the cortex processes visual stimuli.

We believe that the following positive aspects makes the findings of this study of utmost importance:

The authors were the first to record the responses of RGCs (ex vivo) and V1 neurons (in vivo) to the same set of artificial and quasi-naturalistic stimuli.
They also introduced the concept of an "encoding manifold", via a non-negative tensor decomposition, which organizes the recorded neurons according to their responses to features of the stimulus.
Overall, the study is important as it potentially implies revisiting the concept of parallel pathways of visual processing. Moreover, the layer- and cell type-specific distribution of V1 responses potentially opens an avenue for understanding computations of visual features across cortical layers.
The study is also the first to show that ResNet50 and VGG16 trained to classify images in ImageNet do not encode visual feature space in the same way as neurons in V1, indicating a potential limitation of using CNNs to understand cortical function.
Compliments to the author for releasing the analysis codes, allowing for early adoption of the methods.

We also noted several points regarding the study and the manuscript that are either major or minor comments that need to be addressed.

Major aspects of the study that needs to be addressed:

How would the confounding responses of V1 populations to saccadic eye movements (Miura and Scanziani, 2022), engagement/pupil sizes (Franke et al., 2022) or free behavior/locomotion despite head-fixation (Saleem et al., 2013; Niell and Stryker, 2010; Pakan et al., 2016; Mimica et al., 2023; among others) contribute to the continuity of the manifold? If these confounds are taken into account, will we still find a continuous encoding manifold as inferred in this study? How does the non-negative tensor decomposition take into account unobserved latent features?
The alternative hypothesis that the authors seem to test is the parallel pathways of visual processing, i.e., responses of V1 neurons aggregate responses from certain RGC types thereby producing new groupings. We believe that, given the potential confounds as well as the contribution of noisy responses, this hypothesis is not completely ruled out. Is there a way to connect the retina and V1 manifolds? If the encoding manifold was calculated by combining the RGC and V1 neuron responses, does one observe V1 neurons that aggregate close to RGC clusters and if so, are the visual features in this aggregate a superposition of those with which the surrounding RGCs respond to?
How robust are the encoding manifolds to noisy responses from RGCs and V1 neurons? The authors commented that higher factors only reconstruct noise to emphasize the importance of choosing the right tensor dimension. If the neural responses are corrupted by noise (jittering), will one still recover the inferred manifold?
Moreover, looking at the neural responses along the manifold in Fig. 4a, it appears that these trajectories can be explained by the neuron's firing rate to each stimuli (not surprising that ends of the "red" and "yellow" paths are populated by inhibitory neurons). What happens to the encoding manifold when the neural responses are normalized by its firing rate?

Major aspects of the manuscript that needs to be addressed:

The Introduction would benefit from a more expansive literature review on the topic. It needs more context and motivation as to why the authors would assume distinct V1 responses to visual stimuli, given the vast amount of work on the retinotopic organization in the visual cortex. Below are also some notes that we have made:
- In the introduction, the authors noted about the "mounting evidence for distinct neuronal identities in V1". We believe the authors should be careful in phrasing this. Data from Gouwens et al., (2019) showed that while GABAergic inhibitory neurons can be classified into distinct morphological, electrophysiological and transcriptomic (met-) types, glutamatergic neuron classification can be rather unstable. This can also be seen from single nuclei sequencing where glutamatergic neurons form a single continuous manifold, organized by layers, while GABAergic neurons break into clusters based on cell type (Tasic et al., 2018 and Cheng et al., 2022). This picture is completely different when looking at single-cell transcriptomes of RGCs (Tran et al., 2019; Goetz et al., 2022) where one observes distinct transcriptomic clusters.
- The introduction will also benefit in explicitly stating the hypothesis. The authors have stated the following lines in the Discussion section but will benefit the readers if stated as well in the Introduction: "For example, it is possible that retina and V1, while being sensitive to different visual features, could organize the encoding of these features similarly and thereby exhibit similarly clustered (or continuous) encoding manifolds." We believe that contextualizing the vast literature on the parallel pathways as well as the retinotopic organization in V1 will greatly help in the readability of the manuscript.
- The introduction also needs to refine the novelty factor of inferring the encoding manifold. What is the added value of the method, as opposed to previous neural manifolds used in the literature? It will also benefit the manuscript to have references for conventional manifold approaches applied to RGCs and V1 neurons.
- Towards the end of the Introduction, the authors stated that "the population-level representations of stimulus features are fundamentally different between retina and visual cortex". We find this to be a very strong conclusion, given that the presented stimuli does not fully cover the feature space (which the authors have noted in the Discussion section). The authors could be more specific to the type of stimulus features tested in the paper, since other stimulus sets could result in a different conclusion.
We would like to note that the manuscript suffers from generic phrasings that make the reader "imagine" something. We urge the authors to stick to descriptive and concrete examples to make it more tangible to non-experts.
We think that some references in the Conclusion would be more suited in the Discussion. We believe that the Conclusion be kept as a summary of the whole paper without introducing new considerations or concepts.
In the Conclusion, the authors wrote: "As we showed, it is applicable across areas and species". We are not sure how this statement is substantiated by the results of the paper.

Minor aspects of the study that can be addressed:

How does the stimulus-based "decoding" manifold different from that of the encoding manifold and what information is lost by only looking at the decoding perspective? Could it be that the decoding manifold will also have clusters, where only certain neurons are responding, which can then be mapped onto the encoding manifold?
Non-uniform sampling of features. In the artificial ring model, the authors have uniformly sampled the preferred directions and thereby reconstruct the ring. How does the method perform when the directions are non-uniformly sampled?
The authors have observed that the CNN encoding manifold is discrete, resembling more of the retinal manifold than that of V1. One perspective to look at the discreteness of CNN manifold could be the lack of temporal correlations between the response of a CNN "neuron". Note that in the retina, certain ganglion cells have either transient or sustained response, which the authors have shown are neighboring clusters in the encoding manifold. Looking at the responses in Fig. 5d,f, there doesn't seem to be any transient response to the stimulus compared to the response in RGCs and V1 neurons.

Minor aspects of the manuscript/reporting that can be addressed:

Duplicated reference to Devries and Baylor, 1997
When the authors looked into the encoding manifold of CNNs, is there any reason why the authors chose to highlight the manifold in Stage 4? Perhaps, this can be made clearer.
In the Discussion section, the authors noted generating spatially-isotropic stimuli (like the flow stimuli) "to mitigate the impact of cells having receptive fields at different retinotopic locations". We believe this may be an important consideration and may be worth clarifying.
Will the data (stimulus set and responses of neurons) be publicly available online after publication or will it remain available "upon reasonable request"?

The following are suggestions for future studies:

Encoding manifold of naturalistic stimuli in RGCs and V1 populations: Given that RGCs generally evolved to "detect behaviorally relevant features of natural scenes" (Goetz et al., 2022), will we see a continuous manifold in RGC in such presented stimuli? Will the resulting manifold give us an idea on which "behaviorally relevant features" do the recorded RGCs capture and relayed to higher visual pathways?
Encoding manifold in dLGN and superior colliculus: In the Conclusion, the authors have noted their interest in the topological organization in dLGN. It will also be interesting to look into the topological organization in the superior colliculus, in which its superficial layer receives projections from ∼90% of RGCs. Are there already hints of encoding continuity in these regions? Is there a method that can quantify the
Recently, Maheswaranathan et al. (2023) have used a three-layer CNN trained to predict RGC responses from natural scenes, where they have observed the latent responses to coincide with interneurons hence showing that CNNs can be used to understand visual processing. Inspired by this study, it is possible to build a multi-layered CNN trained to predict responses of V1 neurons from natural stimuli. We were wondering then whether we can use the encoding manifold as an indicator that this particular CNN can be used as a model for visual processing.

We wish the authors the best of luck on their future research endeavors!

References:

Miura, S. K., & Scanziani, M. (2022). Distinguishing externally from saccade-induced motion in visual cortex. Nature, 610(7930), 135-142.
Franke, K., Willeke, K. F., Ponder, K., Galdamez, M., Zhou, N., Muhammad, T., ... & Tolias, A. S. (2022). State-dependent pupil dilation rapidly shifts visual feature selectivity. Nature, 610(7930), 128-134.
Saleem, A. B., Ayaz, A., Jeffery, K. J., Harris, K. D., & Carandini, M. (2013). Integration of visual motion and locomotion in mouse visual cortex. Nature neuroscience, 16(12), 1864-1869.
Niell, C. M., & Stryker, M. P. (2010). Modulation of visual responses by behavioral state in mouse visual cortex. Neuron, 65(4), 472-479.
Pakan, J. M., Lowe, S. C., Dylda, E., Keemink, S. W., Currie, S. P., Coutts, C. A., & Rochefort, N. L. (2016). Behavioral-state modulation of inhibition is context-dependent and cell type specific in mouse visual cortex. Elife, 5, e14985.
Mimica, B., Tombaz, T., Battistin, C., Fuglstad, J. G., Dunn, B. A., & Whitlock, J. R. (2023). Behavioral decomposition reveals rich encoding structure employed across neocortex in rats. Nature Communications, 14(1), 3947.
Gouwens, N. W., Sorensen, S. A., Berg, J., Lee, C., Jarsky, T., Ting, J., ... & Koch, C. (2019). Classification of electrophysiological and morphological neuron types in the mouse visual cortex. Nature neuroscience, 22(7), 1182-1195.
Tasic, B., Yao, Z., Graybuck, L. T., Smith, K. A., Nguyen, T. N., Bertagnolli, D., ... & Zeng, H. (2018). Shared and distinct transcriptomic cell types across neocortical areas. Nature, 563(7729), 72-78.
Cheng, S., Butrus, S., Tan, L., Xu, R., Sagireddy, S., Trachtenberg, J. T., ... & Zipursky, S. L. (2022). Vision-dependent specification of cell types and function in the developing cortex. Cell, 185(2), 311-327.
Tran, N. M., Shekhar, K., Whitney, I. E., Jacobi, A., Benhar, I., Hong, G., ... & Sanes, J. R. (2019). Single-cell profiles of retinal ganglion cells differing in resilience to injury reveal neuroprotective genes. Neuron, 104(6), 1039-1055.
Goetz, J., Jessen, Z. F., Jacobi, A., Mani, A., Cooler, S., Greer, D., ... & Schwartz, G. W. (2022). Unified classification of mouse retinal ganglion cells using function, morphology, and gene expression. Cell reports, 40(2).
Maheswaranathan, N., McIntosh, L. T., Tanaka, H., Grant, S., Kastner, D. B., Melander, J. B., ... & Baccus, S. A. (2023). Interpreting the retinal neural code for natural scenes: From computations to neurons. Neuron.

Competing interests

The author declares that they have no competing interests.

Read the original source

Population encoding of stimulus features along the visual hierarchy

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

A shared multi-feature population code for sensory reliability across mouse visual cortex

Early Vision Shapes Recurrent Processing in the Human Visual Cortex

Visual field position shapes input sampling and output routing in the superior colliculus

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A shared multi-feature population code for sensory reliability across mouse visual cortex

Early Vision Shapes Recurrent Processing in the Human Visual Cortex

Visual field position shapes input sampling and output routing in the superior colliculus