A chromatic feature detector in the retina signals visual context changes

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This paper makes a valuable contribution to approaches to studying the stimulus selectivity of sensory neurons. The imaging data that forms the core of the paper is compelling, but the evidence for some of the conclusions reached is limited. A central issue is a reliance on linear measures of stimulus selectivity, which may miss key aspects of retinal coding.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The retina transforms patterns of light into visual feature representations supporting behaviour. These representations are distributed across various types of retinal ganglion cells (RGCs), whose spatial and temporal tuning properties have been studied extensively in many model organisms, including the mouse. However, it has been difficult to link the potentially nonlinear retinal transformations of natural visual inputs to specific ethological purposes. Here, we discover a nonlinear selectivity to chromatic contrast in an RGC type that allows the detection of changes in visual context. We trained a convolutional neural network (CNN) model on large-scale functional recordings of RGC responses to natural mouse movies, and then used this model to search in silico for stimuli that maximally excite distinct types of RGCs. This procedure predicted centre colour opponency in transient suppressed-by-contrast (tSbC) RGCs, a cell type whose function is being debated. We confirmed experimentally that these cells indeed responded very selectively to Green-OFF, UV-ON contrasts. This type of chromatic contrast was characteristic of transitions from ground to sky in the visual scene, as might be elicited by head or eye movements across the horizon. Because tSbC cells performed best among all RGC types at reliably detecting these transitions, we suggest a role for this RGC type in providing contextual information (i.e. sky or ground) necessary for the selection of appropriate behavioural responses to other stimuli, such as looming objects. Our work showcases how a combination of experiments with natural stimuli and computational modelling allows discovering novel types of stimulus selectivity and identifying their potential ethological relevance.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This paper combines a number of cutting-edge approaches to explore the role of a specific mouse retinal ganglion cell type in visual function. The approaches used include calcium imaging to measure responses of RGC populations to a collection of visual stimuli and CNNs to predict the stimuli that maximally activate a given ganglion cell type. The predictions about feature selectivity are tested and used to generate a hypothesized role in visual function for the RGC type identified as interesting. The paper is impressive; my comments are all related to how the work is presented.

    We thank the reviewer for appreciating our study and for the interesting comments.

    Is the MEI approach needed to identify these cells?

    To briefly summarize the approach, the paper fits a CNN to the measured responses to a range of stimuli, extracts the stimulus (over time, space, and color) that is predicted to produce a maximal response for each RGC type, and then uses these MEIs to investigate coding. This reveals that G28 shows strong selectivity for its own MEI over those of other RGC types. The feature of the G28 responses that differentiate it appears to be its spatially-coextensive chromatic opponency. This distinguishing feature, however, should be relatively easy to discover using more standard approaches.

    The concern here is that the paper could be read as indicating that standard approaches to characterizing feature selectivity do not work and that the MEI/CNN approach is superior. There may be reasons why the latter is true that I missed or were not spelled out clearly. I do think the MEI/CNN approach as used in the paper provides a very nice way to compare feature selectivity across RGC types - and that it seems very well suited in this context. But it is less clear that it is needed for the initial identification of the distinguished response features of the different RGC types. What would be helpful for me, and I suspect for many readers, is a more nuanced and detailed description of where the challenges arise in standard feature identification approaches and where the MEI/CNN approaches help overcome those challenges.

    Thank you for the opportunity for clarification. In fact, the MEI (or an alternative nonlinear approach) is strictly necessary to discover this selectivity: as we show above (response #1 to editorial summary), the traditional linear filter approach does not reveal the color opponency. We realize that this fact was not made sufficiently clear in the initial submission. In the revised manuscript, we now include this analysis. Moreover, throughout the manuscript, we added explanations on the differences between MEIs and standard approaches and more intuitions about how to interpret MEIs. We also added a section to the discussion dedicated to explaining the advantages and limitations of the MEI approach.

    Interpretation of MEI temporal structure

    Some aspects of the extracted MEIs look quite close to those that would be expected from more standard measurements of spatial and temporal filtering. Others - most notably some of the temporal filters - do not. In many of the cells, the temporal filters oscillate much more than linear filters estimated from the same cells. In some instances, this temporal structure appears to vary considerably across cells of the same type (Fig. S2). These issues - both the unusual temporal properties of the MEIs and the heterogeneity across RGCs of the same type - need to be discussed in more detail. Related to this point, it would be nice to understand how much of the difference in responses to MEIs in Figure 4d is from differences in space, time, or chromatic properties. Can you mix and match MEI components to get an estimate of that? This is particularly relevant since G28 responds quite well to the G24 MEI.

    One advantage of the MEI approach is that it allows to distinguish between transient and sustained cells in a way that is not possible with the linear filter approach: Because we seek to maximize activity over an extended period of time, transient cells need to be repetitively stimulated whereas sustained cells will also respond in the absence of multiple contrast changes. In the revised manuscript, we add a section explaining this, together with Figure 3-supplement 2, illustrating this point by showing that oscillations disappear when we optimize the MEI for a short time window. The benefit of a longer time window lies in the increased discriminability between transient and sustained cells, which is also shown in the new supplementary figure.

    Regarding the heterogeneity of MEIs, this is most likely due to heterogeneity within the RGC group: “The mixed non-direction-selective groups G17 and G31 probably contain more than one type, as supported by multiple distinct morphologies and genetic identities (for example, G31,32, Extended Data Fig. 5) or response properties (for example, G17, see below)” (Baden et al. Nature 2016). We added a paragraph in the Results section.

    Concerning the reviewer’s last point: We agree that it is important to know whether the defining feature - i.e., the selectivity for chromatic contrast - is robust against variations in other stimulus properties. New electrophysiological data included in the manuscript (Fig. 6e,f) offers some insights here. We probed G28/tSbC cells with full-field flashed stimuli that varied in chromatic contrast. Despite not matching the cell’s preferred spatial and temporal properties, this stimulus still recovered the cell’s preference for chromatic contrast. While we think it is an interesting direction to systematically quantify the relative importance of temporal, spatial and chromatic MEI properties for an RGC type’s responses, we think this is beyond the scope of this manuscript.

    Explanation of RDM analysis

    I really struggled with the analysis in Figure 5b-c. After reading the text several times, this is what I think is happening. Starting with a given RGC type (#20 in Figure 5b), you take the response of each cell in that group to the MEI of each RGC type, and plot those responses in a space where the axes correspond to responses of each RGC of this type. Then you measure euclidean distance between the responses to a pair of MEIs and collect those distances in the RDM matrix. Whether correct or not, this took some time to arrive at and meant filling in some missing pieces in the text. That section should be expanded considerably.

    We appreciate the reviewer’s efforts to understand this analysis and confirm that they interpreted it correctly. However, we decided to remove the analysis. The point we were trying to make with this analysis is that the transformation implemented by G28/tSbC cells “warps” stimulus space and increases the discriminability of stimuli with similar characteristics like the cell’s MEI. We now make this point in a - we think - more accessible manner by the new analysis about the nonlinearity of G28/tSbC cell’s color opponency (see above).

    Centering of MEIs

    How important is the lack of precise centering of the MEIs when you present them? It would be helpful to have some idea about that - either from direct experiments or using a model.

    In the electrophysiological experiments, the MEIs were centered precisely (now Fig. 5 in revised manuscript) and these experiments yielded almost identical results to the 2P imaging experiments, where the MEIs were presented on a grid to approach the optimal position for the recorded cells. Additionally, all model simulations work with perfectly centered MEIs. We hence conclude that our grid-approach at presenting stimuli provided sufficient precision in stimulus positioning.

    We added this information to the revised manuscript.

    Reviewer #2 (Public Review):

    This paper uses two-photon imaging of mouse ganglion cells responding to chromatic natural scenes along with convolutional neural network (CNN) models fit to the responses of a large set of ganglion cells. The authors analyze CNN models to find the most effective input (MEI) for each ganglion cell as a novel approach to identifying ethological function. From these MEIs they identify chromatic opponent ganglion cells, and then further perform experiments with natural stimuli to interpret the ethological function of those cells. They conclude that a type of chromatic opponent ganglion cell is useful for the detection of the transition from the ground to the sky across the horizon. The experimental techniques, data, and fitting of CNN models are all high quality. However, there are conceptual difficulties with both the use of MEIs to draw conclusions about neural function and the ethological interpretations of experiments and data analyses, as well as a lack of comparison with standard approaches. These bear directly both on the primary conclusions of the paper and on the utility of the new approaches.

    We thank the reviewer for the detailed comments.

    1. Claim of feature detection.

    The color opponent cells are cast as a "feature detector" and the term 'detector' is in the title. However insufficient evidence is given for this, and it seems likely a mischaracterization. An example of a ganglion cell that might qualify as a feature detector is the W3 ganglion cell (Zhang et al., 2012). These cells are mostly silent and only fire if there is differential motion on a mostly featureless background. Although this previous work does not conduct a ROC analysis, the combination of strong nonlinearity and strong selectivity are important here, giving good qualitative support for these cells as participating in the function of detecting differential motion against the sky. In the present case, the color opponent cells respond to many stimuli, not just transitions across the horizon. In addition, for the receiver operator characteristic (ROC) analysis as to whether these cells can discriminate transitions across the horizon, the area under the curve (AUC) is on average 0.68. Although there is not a particular AUC threshold for a detector or diagnostic test to have good discrimination, a value of 0.5 is chance, and values between 0.5 and 0.7 are considered poor discrimination, 'not much better than a coin toss' (Applied Logistic Regression, Hosmer et al., 2013, p. 177). The data in Fig. 6F is also more consistent with a general chromatic opponent cell that is not highly selective. These cells may contribute information to the problem of discriminating sky from ground, but also to many other ethologically relevant visual determinations. Characterizing them as feature detectors seems inappropriate and may distract from other functional roles, although they may participate in feature detection performed at a higher level in the brain.

    The reviewer apparently uses a rather narrow definition of a feature detector. We, however, argue for a broader definition, which, in our view, better captures the selectivities described for RGCs in the literature. For example, while W3 cells have been quite extensively studied, one can probably agree on that so far only a fraction of the possible stimulus space has been explored. Therefore, it cannot be excluded that W3 cells respond also to other features than small dark moving dots, but we (like the reviewer) still refer to it as a feature detector. Or, for instance, direction-selective (DS) RGCs are commonly considered feature detectors (i.e., responsive to a specific motion direction), although they also respond to flashes and spike when null-direction motion is paused (Barlow & Levick J Physiol 1965).

    The G28/tSbC cells’ selectivity for full-field changes in chromatic contrast enables them to encode ground-sky horizon transitions reliably across stimulus parameters (e.g., see new Fig. 7i panel). This cell type is thus well-suited to contribute to detecting context changes, as elicited by ground-sky transitions.

    Therefore, we think that the G28/tSbC RGC can be considered a feature detector and as such, could be used at a higher level in the brain to quickly detect changes in visual context (see also Kerschensteiner Annu Rev Vis Sci 2022). Still, their signals may also be useful for other computations (e.g., defocus, as discussed in our manuscript).

    Regarding the ROC analysis, we acknowledge that an average AUC of .68 may seem comparatively low; however, this is based on the temporally downsampled information (i.e., by way of Ca2+ imaging) gathered from the activity of a single cell. A downstream area would have access to the activity of a local population of cells. This AUC value should therefore be considered a lower bound on the discrimination performance of a downstream area. We now comment on this in the manuscript.

    1. Appropriateness of MEI analysis for interpretations of the neural code.

    There is a fundamental incompatibility between the need to characterize a system with a complex nonlinear CNN and then characterizing cells with a single MEI. MEIs represent the peak in a complex landscape of a nonlinear function, and that peak may or may not occur under natural conditions. For example, MEIs do not account for On-Off cells, On-Off direction selectivity, nonlinear subunits, object motion sensitivity, and many other nonlinear cell properties where multiple visual features are combined. MEIs may be a useful tool for clustering and distinguishing cells, but there is not a compelling reason to think that they are representative of cell function. This is an open question, and thus it should not be assumed as a foundation for the study. This paper potentially speaks to this issue, but there is more work to support the usefulness of the approach. Neural networks enable a large set of analyses to understand complex nonlinear effects in a neural code, and it is well understood that the single-feature approach is inadequate for a full understanding of sensory coding. A great concern is that the message that the MEI is the most important representative statistic directs the field away from the primary promise of the analysis of neural networks and takes us back to the days when only a single sensory feature is appreciated, now the MEI instead of the linear receptive field. It is appropriate to use MEI analyses to create hypotheses for further experimental testing, and the paper does this (and states as much) but it further takes the point of view that the MEI is generally informative as the single best summary of the neural code. The representation similarity analysis (Fig. 5) acts on the unfounded assumption that MEIs are generally representative and conveys this point of view, but it is not clear whether anything useful can be drawn from this analysis, and therefore this analysis does not support the conclusions about changes in the representational space. Overall this figure detracts from the paper and can safely be removed. In addition, in going from MEI analysis to testing ethological function, it should be made much more clear that MEIs may not generally be representative of the neural code, especially when nonlinearities are present that require the use of more complex models such as CNNs, and thus testing with other stimuli are required.

    The reviewer correctly characterizes MEIs as representing the peak in a nonlinear loss landscape that, in this case, describes the neurons’ tuning. As such, the MEI approach is indeed capable of characterizing nonlinear neuronal feature selectivities that are captured by a nonlinear model, such as the CNN we used here. We therefore disagree with the suggestion that MEIs should not be used “when nonlinearities are present that require the use of more complex models such as CNNs”. It is unclear what other “analysis of neural networks” the reviewer refers to. One approach to analyze the predictive neural network are MEIs.

    We also want to clarify that, while the reviewer is correct in stating that the MEI approach as used here only identifies a single peak, this does not mean that it cannot capture neuronal selectivities for a combination of features, as long as this combination of features can be described as a point in high-dimensional stimulus space. In fact, this is demonstrated in our manuscript for the case of G28/tSbC cell’s selectivity for large or full-field, sustained changes in chromatic contrast (a combination of spatial, temporal, and chromatic features). While approaches similar to the one used here generate several diverse exciting inputs (Ding et al. bioRxiv 2023) and could therefore also fully capture On-Off selectivities, we pointed out the limitation of MEIs when describing On-Off cells in the manuscript (both original and revised).

    Regarding the reviewer’s concern that “[...] the message that the MEI is the most important representative statistic [...] takes us back to the days when only a single sensory feature is appreciated”. It was certainly not our intention to proclaim MEIs as the ultimate representation of a cell’s response features and we have clarified this in the revised manuscript. However, we also think that (i) in applying a nonlinear method to extract chromatic, temporal, and spatial response properties from natural movie responses, we go beyond many characterizations that use linear methods to extract spatial or temporal only, achromatic response properties from static, white-noise stimuli. This said, we agree that (ii) expanding around the peak is desirable, and we do that in an additional analysis (new Fig. 6); but that reducing complexity to a manageable degree (at least, at first) is useful and even necessary when discovering novel response properties.

    Concerning the representational similarity analysis (RSA): the point we were trying to make with this analysis is that the transformation implemented by G28 “warps” stimulus space and increases the discriminability of stimuli with similar characteristics like the cell’s MEI. We now made this point in a more accessible fashion through the above-mentioned analysis, where we extended the estimate around the peak. We therefore agree to remove the RSA from the paper.

    In the revised manuscript, we (a) discuss the advantages and limitations of the MEI approach in more detail (in Results and Discussion; see also our reply #1) and (b) replaced the RSA analysis.

    1. Usefulness of MEI approach over alternatives. It is claimed that analyzing the MEI is a useful approach to discovering novel neural coding properties, but to show the usefulness of a new tool, it is important to compare results to the traditional technique. The more standard approach would be to analyze the linear receptive field, which would usually come from the STA of white noise measurement, but here this could come from the linear (or linear-nonlinear) model fit to the natural scene response, or by computing an average linear filter from the natural scene model. It is important to assess whether the same conclusion about color opponency can come from this standard approach using the linear feature (average effective input), and whether the MEIs are qualitatively different from the linear feature. The linear feature should thus be compared to MEIs for Fig. 3 and 4, and the linear feature should be compared with the effects of natural stimuli in terms of chromatic contrast (Fig. 6b). With respect to the representation analysis (Fig. 5), although I don't believe this is meaningful for MEIs, if this analysis remains it should also be compared to a representation analysis using the linear feature. In fact, a representation analysis would be more meaningful when performed using the average linear feature as it summarizes a wider range of stimuli, although the most meaningful analysis would be directly on a broader range of responses, which is what is usually done.

    We agree that the comparison with a linear model is an important validation. Therefore, we performed an additional analysis (see also reply #1, as well as Fig. 6 and corresponding section in the manuscript) which demonstrates that an LN model does not recover the chromatic feature selectivity. This finding supports our claims about the usefulness of the MEI approach over linear approaches.

    Regarding the comment on the representation analysis, as mentioned above, we consider it replaced by the analysis comparing results from an LN model and a nonlinear CNN.

    1. Definition of ethological problem. The ethological problem posed here is the detection of the horizon. The stimuli used do not appear to relate to this problem as they do not include the horizon and only include transitions across the horizon. It is not clear whether these stimuli would ever occur with reasonable frequency, as they would only occur with large vertical saccades, which are less common in mice. More common would be smooth transitions across the horizon, or smaller movements with the horizon present in the image. In this case, cells which have a spatial chromatic opponency (which the authors claim are distinct from the ones studied here) would likely be more important for use in chromatic edge detection or discrimination. Therefore the ethological relevance of any of these analyses remains in question.

    It is further not clear if detection is even the correct problem to consider. The horizon is always present, but the problem is to determine its location, a conclusion that will likely come from a population of cells. This is a distinct problem from detecting a small object, such as a small object against the background of the sky, which may be a more relevant problem to consider.

    Thank you for giving us the opportunity to clear these things up. First, we would like to clarify that we propose that G28/tSbC cells contribute to detecting context changes, such as transitions across the horizon from ground to sky, not to detecting the horizon itself. We acknowledge that we were not clear enough about this in the manuscript and corrected this. To back-up our hypothesis that G28 RGCs contribute to detecting context changes, we performed an additional simulation analysis, which is described in our reply #3 (see above).

    1. Difference in cell type from those previously described. It is claimed that the chromatic opponent cells are different from those previously described based on the MEI analysis, but we cannot conclude this because previous work did not perform an MEI analysis. An analysis should be used that is comparable to previous work, the linear spatiotemporal receptive field should be sufficient. However, there is a concern that because linear features can change with stimulus statistics (Hosoya et al., 2005), a linear feature fit to natural scenes may be different than those from previous studies even for the same cell type. The best approach would likely be presenting a white noise stimulus to the natural scenes model to compute a linear feature, which still carries the assumption that this linear feature from the model fit to a natural stimulus would be comparable to previous studies. If the previous cells have spatial chromatic opponency and the current cells only have chromatic opponency in the center, there should be both types of cells in the current data set. One technical aspect relating to this is that MEIs were space-time separable. Because the center and surround have a different time course, enforcing this separability may suppress sensitivity in the surround. Therefore, it would likely be better if this separability were not enforced in determining whether the current cells are different than previously described cells. As to whether these cells are actually different than those previously described, the authors should consider the following uncited work; (Ekesten Gouras, 2005), which identified chromatic opponent cells in mice in approximate numbers to those here (~ 2%). In addition, (Yin et al., 2009) in guinea pigs and (Michael, 1968) in ground squirrels found color-opponent ganglion cells without effects of a spatial surround as described in the current study.

    First of all, we did not intend to claim to have discovered a completely new type of color-opponent tuning in general; what we were trying to say is that tSbC cells display spatially co-extensive color opponency, a feature selectivity previously not described in this mouse RGC type, and which may be used to signal context changes as elicited by ground-sky transitions.

    Concerning the reviewer’s first argument about a lack of comparability of our results to results previously obtained with a different approach: We think that this is now addressed by the new analysis (new Fig. 6), where we show why linear methods are limited in their capability to recover the type of color opponency that we discovered with the MEI approach.

    Regarding the argument about center-surround opponency, we agree that “if the previous cells have spatial chromatic opponency and the current cells only have chromatic opponency in the center, there should be both types of cells in the current data set”. We did not focus on analyzing center-surround opponency in the present study, but from the MEIs, it is visible that many cells have a stronger antagonistic surround in the green channel compared to the UV channel (see Fig. 4a, example RGCs of G21, G23, G24; Figure 3-supplement 1 example RGCs of G21, G23, G24, G31, G32). Importantly, the MEIs shown in Fig. 4a were also shown in the verification experiment, and had G28 RGCs preferred this kind of stimulus, they would have responded preferentially to these MEIs, which was not the case (Fig. 4f).

    It should also be noted here that, while the model’s filters were space-time separable, we did not impose a restriction on the MEIs to be space-time separable during optimization. However, we analyzed only the rank 1 components of the MEIs (see Methods section Validating MEIs experimentally). since our analysis focused on aspects of retinal processing not contingent on spatiotemporal interactions in the stimulus.

    In summary, we are convinced that our finding of center-opponency in G28 is not an artifact of the methodology.

    We discuss this in the manuscript and add the references mentioned by the reviewer to the respective part of the Discussion.

    Reviewer #3 (Public Review):

    This study aims to discover ethologically relevant feature selectivity of mouse retinal ganglion cells. The authors took an innovative approach that uses large-scale calcium imaging data from retinal ganglion cells stimulated with both artificial and natural visual stimuli to train a convolutional neural network (CNN) model. The resulting CNN model is able to predict stimuli that maximally excite individual ganglion cell types.

    The authors discovered that modeling suggests that the "transient suppressed-by-contrast" ganglion cells are selectively responsive to Green-Off, UV-On contrasts, a feature that signals the transition from the ground to the sky when the animal explores the visual environment. They tested this hypothesis by measuring the responses of these suppressed-by-contrast cells to natural movies, and showed that these cells are preferentially activated by frames containing ground-to-sky transitions and exhibit the highest selectivity of this feature among all ganglion cell types. They further verified this novel feature selectivity by single-cell patch clamp recording.

    This work is of high impact because it establishes a new paradigm for studying feature selectivity in visual neurons. The data and analysis are of high quality and rigor, and the results are convincing. Overall, this is a timely study that leverages rapidly developing AI tools to tackle the complexity of both natural stimuli and neuronal responses and provides new insights into sensory processing.

    We thank the reviewer for appreciating our study.

  2. eLife assessment

    This paper makes a valuable contribution to approaches to studying the stimulus selectivity of sensory neurons. The imaging data that forms the core of the paper is compelling, but the evidence for some of the conclusions reached is limited. A central issue is a reliance on linear measures of stimulus selectivity, which may miss key aspects of retinal coding.

  3. Reviewer #1 (Public Review):

    This paper combines a number of cutting-edge approaches to explore the role of a specific mouse retinal ganglion cell type in visual function. The approaches used include calcium imaging to measure responses of RGC populations to a collection of visual stimuli and CNNs to predict the stimuli that maximally activate a given ganglion cell type. The predictions about feature selectivity are tested and used to generate a hypothesized role in visual function for the RGC type identified as interesting. The paper is impressive; my comments are all related to how the work is presented.

    Is the MEI approach needed to identify these cells?
    To briefly summarize the approach, the paper fits a CNN to the measured responses to a range of stimuli, extracts the stimulus (over time, space, and color) that is predicted to produce a maximal response for each RGC type, and then uses these MEIs to investigate coding. This reveals that G28 shows strong selectivity for its own MEI over those of other RGC types. The feature of the G28 responses that differentiate it appears to be its spatially-coextensive chromatic opponency. This distinguishing feature, however, should be relatively easy to discover using more standard approaches.
    The concern here is that the paper could be read as indicating that standard approaches to characterizing feature selectivity do not work and that the MEI/CNN approach is superior. There may be reasons why the latter is true that I missed or were not spelled out clearly. I do think the MEI/CNN approach as used in the paper provides a very nice way to compare feature selectivity across RGC types - and that it seems very well suited in this context. But it is less clear that it is needed for the initial identification of the distinguished response features of the different RGC types.
    What would be helpful for me, and I suspect for many readers, is a more nuanced and detailed description of where the challenges arise in standard feature identification approaches and where the MEI/CNN approaches help overcome those challenges.

    Interpretation of MEI temporal structure
    Some aspects of the extracted MEIs look quite close to those that would be expected from more standard measurements of spatial and temporal filtering. Others - most notably some of the temporal filters - do not. In many of the cells, the temporal filters oscillate much more than linear filters estimated from the same cells. In some instances, this temporal structure appears to vary considerably across cells of the same type (Fig. S2). These issues - both the unusual temporal properties of the MEIs and the heterogeneity across RGCs of the same type - need to be discussed in more detail.
    Related to this point, it would be nice to understand how much of the difference in responses to MEIs in Figure 4d is from differences in space, time, or chromatic properties. Can you mix and match MEI components to get an estimate of that? This is particularly relevant since G28 responds quite well to the G24 MEI.

    Explanation of RDM analysis
    I really struggled with the analysis in Figure 5b-c. After reading the text several times, this is what I think is happening. Starting with a given RGC type (#20 in Figure 5b), you take the response of each cell in that group to the MEI of each RGC type, and plot those responses in a space where the axes correspond to responses of each RGC of this type. Then you measure euclidean distance between the responses to a pair of MEIs and collect those distances in the RDM matrix. Whether correct or not, this took some time to arrive at and meant filling in some missing pieces in the text. That section should be expanded considerably.

    Centering of MEIs
    How important is the lack of precise centering of the MEIs when you present them? It would be helpful to have some idea about that - either from direct experiments or using a model.

  4. Reviewer #2 (Public Review):

    This paper uses two-photon imaging of mouse ganglion cells responding to chromatic natural scenes along with convolutional neural network (CNN) models fit to the responses of a large set of ganglion cells. The authors analyze CNN models to find the most effective input (MEI) for each ganglion cell as a novel approach to identifying ethological function. From these MEIs they identify chromatic opponent ganglion cells, and then further perform experiments with natural stimuli to interpret the ethological function of those cells. They conclude that a type of chromatic opponent ganglion cell is useful for the detection of the transition from the ground to the sky across the horizon. The experimental techniques, data, and fitting of CNN models are all high quality. However, there are conceptual difficulties with both the use of MEIs to draw conclusions about neural function and the ethological interpretations of experiments and data analyses, as well as a lack of comparison with standard approaches. These bear directly both on the primary conclusions of the paper and on the utility of the new approaches.

    1. Claim of feature detection. The color opponent cells are cast as a "feature detector" and the term 'detector' is in the title. However insufficient evidence is given for this, and it seems likely a mischaracterization. An example of a ganglion cell that might qualify as a feature detector is the W3 ganglion cell (Zhang et al., 2012). These cells are mostly silent and only fire if there is differential motion on a mostly featureless background. Although this previous work does not conduct a ROC analysis, the combination of strong nonlinearity and strong selectivity are important here, giving good qualitative support for these cells as participating in the function of detecting differential motion against the sky. In the present case, the color opponent cells respond to many stimuli, not just transitions across the horizon. In addition, for the receiver operator characteristic (ROC) analysis as to whether these cells can discriminate transitions across the horizon, the area under the curve (AUC) is on average 0.68. Although there is not a particular AUC threshold for a detector or diagnostic test to have good discrimination, a value of 0.5 is chance, and values between 0.5 and 0.7 are considered poor discrimination, 'not much better than a coin toss' (Applied Logistic Regression, Hosmer et al., 2013, p. 177). The data in Fig. 6F is also more consistent with a general chromatic opponent cell that is not highly selective. These cells may contribute information to the problem of discriminating sky from ground, but also to many other ethologically relevant visual determinations. Characterizing them as feature detectors seems inappropriate and may distract from other functional roles, although they may participate in feature detection performed at a higher level in the brain.

    2. Appropriateness of MEI analysis for interpretations of the neural code. There is a fundamental incompatibility between the need to characterize a system with a complex nonlinear CNN and then characterizing cells with a single MEI. MEIs represent the peak in a complex landscape of a nonlinear function, and that peak may or may not occur under natural conditions. For example, MEIs do not account for On-Off cells, On-Off direction selectivity, nonlinear subunits, object motion sensitivity, and many other nonlinear cell properties where multiple visual features are combined. MEIs may be a useful tool for clustering and distinguishing cells, but there is not a compelling reason to think that they are representative of cell function. This is an open question, and thus it should not be assumed as a foundation for the study. This paper potentially speaks to this issue, but there is more work to support the usefulness of the approach. Neural networks enable a large set of analyses to understand complex nonlinear effects in a neural code, and it is well understood that the single-feature approach is inadequate for a full understanding of sensory coding. A great concern is that the message that the MEI is the most important representative statistic directs the field away from the primary promise of the analysis of neural networks and takes us back to the days when only a single sensory feature is appreciated, now the MEI instead of the linear receptive field. It is appropriate to use MEI analyses to create hypotheses for further experimental testing, and the paper does this (and states as much) but it further takes the point of view that the MEI is generally informative as the single best summary of the neural code. The representation similarity analysis (Fig. 5) acts on the unfounded assumption that MEIs are generally representative and conveys this point of view, but it is not clear whether anything useful can be drawn from this analysis, and therefore this analysis does not support the conclusions about changes in the representational space. Overall this figure detracts from the paper and can safely be removed. In addition, in going from MEI analysis to testing ethological function, it should be made much more clear that MEIs may not generally be representative of the neural code, especially when nonlinearities are present that require the use of more complex models such as CNNs, and thus testing with other stimuli are required.

    3. Usefulness of MEI approach over alternatives. It is claimed that analyzing the MEI is a useful approach to discovering novel neural coding properties, but to show the usefulness of a new tool, it is important to compare results to the traditional technique. The more standard approach would be to analyze the linear receptive field, which would usually come from the STA of white noise measurement, but here this could come from the linear (or linear-nonlinear) model fit to the natural scene response, or by computing an average linear filter from the natural scene model. It is important to assess whether the same conclusion about color opponency can come from this standard approach using the linear feature (average effective input), and whether the MEIs are qualitatively different from the linear feature. The linear feature should thus be compared to MEIs for Fig. 3 and 4, and the linear feature should be compared with the effects of natural stimuli in terms of chromatic contrast (Fig. 6b). With respect to the representation analysis (Fig. 5), although I don't believe this is meaningful for MEIs, if this analysis remains it should also be compared to a representation analysis using the linear feature. In fact, a representation analysis would be more meaningful when performed using the average linear feature as it summarizes a wider range of stimuli, although the most meaningful analysis would be directly on a broader range of responses, which is what is usually done.

    4. Definition of ethological problem. The ethological problem posed here is the detection of the horizon. The stimuli used do not appear to relate to this problem as they do not include the horizon and only include transitions across the horizon. It is not clear whether these stimuli would ever occur with reasonable frequency, as they would only occur with large vertical saccades, which are less common in mice. More common would be smooth transitions across the horizon, or smaller movements with the horizon present in the image. In this case, cells which have a spatial chromatic opponency (which the authors claim are distinct from the ones studied here) would likely be more important for use in chromatic edge detection or discrimination. Therefore the ethological relevance of any of these analyses remains in question.

    It is further not clear if detection is even the correct problem to consider. The horizon is always present, but the problem is to determine its location, a conclusion that will likely come from a population of cells. This is a distinct problem from detecting a small object, such as a small object against the background of the sky, which may be a more relevant problem to consider.

    5. Difference in cell type from those previously described. It is claimed that the chromatic opponent cells are different from those previously described based on the MEI analysis, but we cannot conclude this because previous work did not perform an MEI analysis. An analysis should be used that is comparable to previous work, the linear spatiotemporal receptive field should be sufficient. However, there is a concern that because linear features can change with stimulus statistics (Hosoya et al., 2005), a linear feature fit to natural scenes may be different than those from previous studies even for the same cell type. The best approach would likely be presenting a white noise stimulus to the natural scenes model to compute a linear feature, which still carries the assumption that this linear feature from the model fit to a natural stimulus would be comparable to previous studies. If the previous cells have spatial chromatic opponency and the current cells only have chromatic opponency in the center, there should be both types of cells in the current data set. One technical aspect relating to this is that MEIs were space-time separable. Because the center and surround have a different time course, enforcing this separability may suppress sensitivity in the surround. Therefore, it would likely be better if this separability were not enforced in determining whether the current cells are different than previously described cells. As to whether these cells are actually different than those previously described, the authors should consider the following uncited work; (Ekesten Gouras, 2005), which identified chromatic opponent cells in mice in approximate numbers to those here (~ 2%). In addition, (Yin et al., 2009) in guinea pigs and (Michael, 1968) in ground squirrels found color-opponent ganglion cells without effects of a spatial surround as described in the current study.

  5. Reviewer #3 (Public Review):

    This study aims to discover ethologically relevant feature selectivity of mouse retinal ganglion cells. The authors took an innovative approach that uses large-scale calcium imaging data from retinal ganglion cells stimulated with both artificial and natural visual stimuli to train a convolutional neural network (CNN) model. The resulting CNN model is able to predict stimuli that maximally excite individual ganglion cell types. The authors discovered that modeling suggests that the "transient suppressed-by-contrast" ganglion cells are selectively responsive to Green-Off, UV-On contrasts, a feature that signals the transition from the ground to the sky when the animal explores the visual environment. They tested this hypothesis by measuring the responses of these suppressed-by-contrast cells to natural movies, and showed that these cells are preferentially activated by frames containing ground-to-sky transitions and exhibit the highest selectivity of this feature among all ganglion cell types. They further verified this novel feature selectivity by single-cell patch clamp recording.

    This work is of high impact because it establishes a new paradigm for studying feature selectivity in visual neurons. The data and analysis are of high quality and rigor, and the results are convincing. Overall, this is a timely study that leverages rapidly developing AI tools to tackle the complexity of both natural stimuli and neuronal responses and provides new insights into sensory processing.