Nonlinear spatial integration underlies the diversity of retinal ganglion cell responses to natural stimuli

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

How neurons encode natural stimuli is a fundamental question for sensory neuroscience. In the early visual system, standard encoding models assume that neurons linearly filter incoming stimuli through their receptive fields, but artificial stimuli, such as reversing gratings, often reveal nonlinear spatial processing. We investigated whether such nonlinear processing is relevant for the encoding of natural images in ganglion cells of the mouse retina. We found that standard linear receptive field models fail to capture the spiking activity for a large proportion of cells. These cells displayed pronounced sensitivity to fine spatial contrast, and local signal rectification was identified as the dominant nonlinearity. In addition, we also observed a class of nonlinear ganglion cells with opposite tuning for spatial contrast and a particular sensitivity for spatially homogeneous stimuli. Our work highlights receptive field nonlinearities as a crucial component for understanding early sensory encoding in the context of natural stimuli.

Article activity feed

  1. ##Author Response

    We would like to thank the three reviewers for their efforts and the constructive feedback. Below, we describe how we will address the reviewers’ comments in an updated manuscript.

    ###Summary:

    All of the reviewers expressed concerns about the advance that the work described in the paper represents. These issues were a focus of the consultation among the reviewers. The main concern is that the work needs to go beyond demonstrating that some ganglion cells exhibit nonlinear integration for naturalistic inputs - as that point is quite well established in the literature. The comparison between natural stimuli and gratings could help in this regard, but several issues confound that comparison (e.g. differences in dynamics of the two types of stimuli). These concerns are detailed in the individual reviews below.

    ###Reviewer #1:

    This paper investigates how retinal ganglion cells integrate inputs across space, with a focus on natural images. Nonlinear spatial integration is a well-studied property of ganglion cells, but it has been largely characterized using grating stimuli. A few studies have extended this to look at spatial integration in the context of natural images, but we certainly lack a comprehensive treatment of that issue. The current paper has a number of strengths - notably using a number of complementary stimuli and analysis tools to study a large population of ganglion cells and linking properties of responses to artificial stimuli with those to natural stimuli. It also has a few weaknesses (some detailed carefully in the paper) - such as the inability to identify ganglion cell types (aside from a few), and to pinpoint specific circuit mechanisms. These are limitations of the techniques used. This is not a request as much as setting the context of the contribution of the paper. Generally the paper was in good shape, and the data supported the conclusions well. I do think there are a number of issues that could be strengthened. Those are listed below in rough order of importance.

    Statistical correlations in natural scenes:

    A number of analyses in the paper rely on estimating the spatial contrast from an image and comparing the dependence of various measures of the cells' responses on spatial contrast. A danger in this analysis is that spatial contrast is likely correlated with many other statistical properties of the image, so attributing a given response property to spatial contrast has some potential confounds. This issue should be discussed as a possible caveat, unless the authors can rule it out. The paper, accurately, describes the results in terms of correlations (and not causal relationships), but some discussion of the complexity of natural image statistics would be helpful.

    Spatial contrast is defined in our work via the variance of pixel intensity inside the receptive field. Indeed, spatial contrast may reflect different aspects of visual scenes, such as object boundaries, textures, or gradients in light intensity. Differences in the effects of these image features on a ganglion cell’s response will not be captured by our analysis. However, the goal of relating spatial contrast to spike count was primarily to analyze whether the spatial structure of light intensity inside the receptive field was related to the response of a given ganglion cell (beyond the mean illumination), and the pixel intensity variance provides a simple, straightforward measure of this spatial structure. To clarify this aspect and better relate it to the complexity of natural images, we will add a corresponding paragraph in the Discussion.

    Comparison of grating and natural scene spatial scale:

    The section starting around line 233 was confusing for several reasons. First, this section starts by measuring the spatial scale associated with the grating responses, and then comparing that to LN model performance for natural inputs. It's not clear why the spatial scale is the relevant aspect of the responses to gratings. Indeed, the next paragraph provides a measure of the relative sensitivity of the nonlinear and linear response components (via a comparison of F1 and F2 responses). It would be helpful to include some initial text to motivate the different measures of the grating responses and to anticipate that you will look at both spatial scale and sensitivity.

    A related issue that bears more directly on the scientific conclusions comes up later in the blurring experiments. The issue is whether it is valid to directly compare the apparent spatial scale of nonlinear responses to images (estimated via blurring) with that of the grating responses. Natural images should have much higher power at low spatial frequencies, and this may strongly impact the spatial scale identified with the blurring experiments.

    We agree that the writing may not have been entirely clear, and we will reorganize the material to discuss the extracted spatial scale and nonlinearity index in parallel as suggested. Regarding the difference in spatial scales from reversing gratings and blurred natural images: yes, it is also our interpretation that the power at low spatial frequencies plays a key role. Our main point here was to assess whether and to what degree the typical analyses of spatial nonlinearity as measured from reversing gratings translate to natural images despite the differences in spatial and temporal structure of the two stimulus classes. In a revised manuscript, we will make sure to earlier clarify the role of low spatial frequencies.

    Clustering of orientation-selective cells:

    An interesting suggestion in the paper is that the orientation-selective cells can be divided into two groups that differ in their spatial integration properties. Do these groups represent different orientations, as suggested in the text? That seems a simple piece of information to add. Related to this, I would suggest moving Figure S4 into the main text.

    We do not have information about the absolute preferred orientations of the orientation-selective (OS) cells, as we did not keep track of retinal orientation when placing the retinas on the multielectrode array. At this point, we can therefore only rely on indirect analyses of relative preferred orientations between pairs of OS cells in the same retina. These indicate that pairs of two nonlinear OS cells tend to have aligned preferred orientation (and similarly for pairs of linear OFF OS cells), but pairs of a linear and a nonlinear OFF cell tend to have divergent preferred orientations. This is shown in Fig. S4C. For a revised manuscript, we will consider integrating Fig. S4 into the main text, as suggested.

    Presentation of checkerboard stimuli and results:

    The checkerboard analysis, particularly how it isolates properties of spatial integration, could get introduced more thoroughly for a reader unfamiliar with it. A related issue is how well the chosen isoresponse contour captures structure in the full distribution of responses. In some cases that looks pretty good, but in others it is less clear. Could you add a supplementary figure or something similar that characterizes how consistent the isoresponse contours are for different response levels?

    These are good suggestions, and we will aim at clarifying the analysis as proposed and add information about the consistency of iso-response contours for different response levels. In the present analysis, the iso-response contours are used just for illustration, whereas the quantification of rectification and integration of preferred contrast are extracted from specific points in the stimulus-response space, which we found to work robustly for a population analysis without being strongly effected by threshold or saturation effects of the cells. We will explain this more clearly in a revised manuscript.

    Drift in responses over time:

    Some of the rasters - e.g. the bottom left in Figure 1C - show considerable drift over time. It is important that this drift not be interpreted as a failure of the LN model and hence indicative of nonlinear spatial integration. Can you test for drift like this across cells, and exclude any that seem potentially problematic? More generally, some assurance that the variability in the responses for a given generator signal value is real variability across images is needed.

    The presentation of all 300 natural images over ten trials takes about 50 minutes and some drift over this period seems unavoidable. To minimize systematic effects of experimental drift on the measured average responses for different images, we applied randomization within trials, which assured that all images were presented once in random order in each trial before the next trial started. In addition, to quantify the real variability over images of the average response for a given generator signal, we applied a goodness-of-fit measure (CCnorm) that takes into account variability over trials.

    We now also tested directly for the drift mentioned by the reviewer, but observed sizeable effects in only a small subset of cells that were included in the analysis. In most cases, drift corresponded to a global scaling that approximately affected responses to all images proportionally. This is reflected in a high correlation over images between the average responses of the first five and last five trials; 94% of analyzed cells had a correlation coefficient of at least 0.7. Such global scaling of responses does not affect the analysis of differences in average responses. In a revised manuscript, we will provide analyses of drift effects and exclude cells that contain drift effects that appear to deviate from global response scaling.

    ###Reviewer #2:

    Summary:

    Understanding how retinal ganglion cells respond to natural stimuli is a central but daunting question, which retinal neurophysiologists have begun to tackle recently. Here Karamanlis and Gollisch perform large-scale multi-electrode recordings in the mouse retina and demonstrate that the responses of many ganglion cells cannot be predicted by standard linear-nonlinear models (L-LN). They go on to test a variety of clever artificial stimuli that emphasize and allow for the quantification of the non-linear aspects of RGCs responses and convincingly demonstrate that non-linear processing is associated with sensitivity to fine spatial contrasts (subunits) and local rectification. While these aspects of RGC receptive fields have been previously described, demonstrating their applicability to natural vision is a significant advancement.

    Major Comments:

    My first main concern is with the way the paper is written. It does not highlight the significant advancements but rather emphasizes what is already known from other studies. For example, many of the conclusions of non-linear spatial integration & signal rectification arising in bipolar cells have been well described previously. By contrast, novel aspects like the sensitivity of reversal gratings being unrelated to LN model performance for natural scenes should be explained more in detail. The authors should more clearly state the major advancements that are being made here beyond what has already been shown previously (e.g. Turner and Rieke, 2016)

    It is possible that our efforts to provide context by relating our results to established findings in retinal signal integration overshadowed the novel aspects of our work. As suggested, we will aim at pointing out these aspects more clearly. For example, compared to the work of Turner and Rieke (2016), we a) focused on a different species with more diversity in accessible RGC types, b) generalized the connection of spatial integration and natural scene encoding to a wider range of cell types (e.g. including also spatially linear and nonlinear ON-OFF cells as well as cells that are inversely sensitive to spatial contrast), and c) developed methods to assess and quantitatively characterize subunit nonlinearities with multielectrode recordings of many cells in parallel, without the need for intracellular recordings or knowledge of the receptive field location.

    Second, the authors never include non-linear subunits in their model to demonstrate improved performance. Testing models with filters that incorporate rectification and convexity as experimentally determined will enable them to show their utility more convincingly. Without this, the reader is left with the conclusion that there are RGCs that exhibit non-linear or linear spatial integration (already known) and that non-linear integrators cause LN models to perform poorly with natural images (Turner and Rieke, 2016).

    The aim of the present work was to assess how well models with linear receptive fields account for responses to natural images in various cells of the mouse retina and whether the models’ shortcomings can be related to the cells’ spatial stimulus integration characteristics. While we agree that models with nonlinear subunits could help support the conclusions, fitting such models to recorded data is – we believe – beyond the scope of the current manuscript. The many parameters of nonlinear subunit models, such as the number, shape, and layout of subunits or their nonlinearity and weight, all likely vary considerably across the diverse population of cells in our recordings. To avoid extensive parameter fitting, simplified models with ad hoc selection of subunit layouts and nonlinearities could help assess whether spatial nonlinearities are important, as in the work by Turner and Rieke (2016). Instead, as an alternative, we chose to analyze the importance of spatial nonlinearities via the effect of spatial contrast in images with similar mean intensity in the receptive field (e.g. Fig. 2). For our data, an advantage of this approach is that it is directly applicable to cell types with diverse spatial integration characteristics, such as the cells that are inversely sensitive to spatial contrast, which wouldn’t be captured by a standard subunit model with rectifying subunit nonlinearities. In future work, however, we plan to analyze subunit models that can account for the diversity of observed response patterns.

    Third, I'm not sure how 'natural' their natural images are, given static images are flashed over the cell intermittently. While such stimuli might simulate some sort of saccadic eye movements, whether this is relevant for mouse vision is not clear. Would linear models be more predictive for responses to natural movies? Some discussion on this issue would be helpful.

    Rather than aiming for fully natural movie-like stimuli, we used flashed images in our work to focus on aspects of spatial integration. This indeed entails a simplification of the temporal structure of natural stimuli, which was intended, but it preserves natural spatial structure, such as the occurrence of objects, boundaries, textures, and intensity gradients, as well as continuously decreasing power for higher spatial frequencies. Nonlinear spatial integration in the presence of this natural spatial structure will likely also shape responses under natural movies. To clarify this approach, we will re-evaluate our wording regarding the application of natural stimuli in our work and discuss the simplification compared to natural movies, as suggested.

    ###Reviewer #3:

    The manuscript by Karamanlis and Gollisch examines the responses of mouse retinal ganglion cells (RGCs) to natural stimuli. The primary conclusion of the manuscript is that spatial integration of stimuli within the receptive field is nonlinear. This nonlinear integration is consistent with "local signal rectification". This results in a set of RGCs that are sensitive to spatial contrast within the RF. The Authors also note the presence of cells that are suppressed by contrast and cells that prefer uniform stimulation of the RF. To reach these conclusions the authors use multi-electrode array recordings from isolated mouse retina. Spatial RFs are estimated using white noise stimuli, which are then used to generate a null-model for linear spatial summation. They compare predictions of this null-model to the responses of the same RGCs to briefly flashed natural images. The authors find some RGCs that are consistent with this null model and many that are not consistent. The authors correlate deviations from linear spatial summation to deviations revealed by contrast reversing gratings. They also used a mixed-contrast, flashed-checkerboard paradigm to map the contrast tuning and rectification of RF subunits. Finally, the authors show that some of these results track with functionally distinct RGC types such as direction-selective and "IRS" RGCs.

    The data and analyses presented in this manuscript are high quality. However, I think the study is largely consistent with many previous studies that demonstrate nonlinear spatial integration among RGCs in the mammalian (including mouse) retina. I think the Authors view the use of natural stimuli as a major departure from previous work, but I'm not convinced of this for two reasons. First, I don't see a compelling reason to think that results using contrast reversing gratings or other 'textured stimuli' (e.g. Schwartz et al Nat Neuro 2012) would fail to generalize to flashed natural scenes. Second, the implicit claim here is that a 200ms flashed natural scene interleaved with an 800ms gray screen is a natural stimulus. I think this assumes a lot about the space-time separability of the RF mechanisms, and these assumptions are not well justified.

    Major Concerns:

    1. I think the introduction of the manuscript is building a straw man argument, suggesting that many (or most) scientists think the retina is predominantly linear. A pubmed search of 'retinal ganglion cell' and 'nonlinear' produced more than 300 studies. Specifying subunit nonlinearity produces 28 studies. The discovery of subunit nonlinearities is roughly 50 years old and many manuscripts demonstrate Y-like receptive fields are more common across RGC types than X-like receptive fields.

    The goal of our work was not to show that receptive fields of mouse retinal ganglion cell are (often) spatially nonlinear, but to test whether these nonlinearities matter for natural images. It is conceivable that spatial nonlinearities as measured with typical artificial stimuli such as spatial gratings or spatiotemporal white noise are not (as) relevant for natural images because the simultaneous occurrence of strong positive and negative contrast inside a receptive field is much rarer in natural images. Indeed, in our work we find that traditional measurements of spatial nonlinearities with reversing gratings do not provide a robust quantitative prediction of whether spatial nonlinearities matter under natural images for a given ganglion cell. As laid out in the Introduction, there is surprisingly little research yet on how spatial nonlinearities affect the encoding of natural images, and in a revised version of the manuscript, we will aim at clarifying that this is the focus of our work here.

    1. The authors seem to be arguing that the spatial nonlinearities engaged by the contrast reversing gratings are not the same as those engaged by their natural scenes (Figure 3). However, I think the authors are assuming too much that the spatial and temporal components of the RFs are separable. The flashed natural scenes are interleaved with relatively long gray screens. The contrast reverse granting are reversed in a square-wave fashion with no interleaved gray screen. These distinct spatiotemporal dynamics in the stimuli seem likely to explain the difference. This would also seem likely to explain why the flashed checkerboards in Figure 4 produced results more correlated to flashed scenes in Figure 1. In summary, I don't see a strong reason to think the authors are observing anything other than subunit rectification of the sort described by Hochstein and Shapley in the 1970s and followed up in many subsequent studies.

    We do not think that spatial nonlinearities as observed with reversing gratings or with natural stimuli are related to different mechanisms. The point of our analysis was rather to assess whether typical assessments of spatial nonlinearities with reversing gratings allow quantitative predictions about the relevance of spatial nonlinearities under flashed natural images, and we find that this is often not the case. We believe that this is largely due to the differences in spatial structure, in particular, the prevalence of high-contrast edges in the gratings. Yet, indeed, differences in temporal stimulus structure might also contribute. We actually tested flash-like presentations of gratings in some of our recordings, and results were quite similar to those obtained with contrast-reversing gratings and led to the same conclusions. We will describe this in the revised manuscript for clarification.

    1. It is not clear to this reviewer that flashed natural images interleaved by a gray screen is qualitative more natural than white noise, sinusoidal gratings, or square-wave gratings.

    The spatial structure of natural images is the focus of the present work. It is in this aspect that flashed photographs are more natural than typical artificial stimuli like spatiotemporal white noise or gratings. In particular, natural images contain a broad spectrum of spatial frequencies with relatively more power at smaller frequencies, and they combine occasional edges with intensity gradients and textures. Gratings, for example, are characterized by high power at large spatial frequencies, that is, high spatial contrast, which is well suited for triggering effects of spatial nonlinearities but occurs much more rarely in natural images. Thus, understanding whether spatial nonlinearities are important in a natural setting requires considering stimuli that match the natural spatial structure. It seems likely that nonlinear spatial integration observed under flashed presentation of natural images remains relevant when stimuli are supplemented with natural temporal structure, even though the latter may likely trigger additional effects that shape the responses (e.g. adaptation or nonlinear temporal integration).

    1. The null-model constructed by the authors in Figure 1 assumes the RF follows a specific functional form (e.g. Gaussian). However, many studies show that individual RFs frequently exhibit strong deviations from a Gaussian RF. To what extent are the deviations from the null model produced by deviations from linear summation or just linear mechanisms that deviate from the specific parametric form imposed by the model?

    Measuring the detailed structure of receptive fields (RFs) with high precision from time-limited experiments is a challenge, and using a fitted (elliptical) Gaussian profile is a standard procedure for limiting the effect of noise in the RF structure. We also tried using the pixel-wise spatial profile obtained from the reverse-correlation analysis as a spatial filter, but results were similar, yet often more noisy. We therefore settled on the standard procedure of using a Gaussian fit to the RF. Deviations from the Gaussian profile can indeed contribute to deviations of the model. Yet, for natural images, which have most of their power in low spatial frequencies, these deviations are likely to be small. Furthermore, our subsequent analyses show that the Gaussian RF model provides a useful baseline because it allows us to extract the relation between model deviations and image structure. In addition, the results from the model analysis were supported by the findings under presentation of blurred natural images, which did not require any assumptions about the underlying RF model. In a revised manuscript, we will point out that relying on Gaussian RFs is a choice that we make and that deviations of the receptive field structure may contribute to decreased model performance, but that the subsequent analyses support the usefulness of the applied Gaussian RF model.

    1. It was unclear how the authors rule out the contribution of differences in (nonlinear) temporal integration to the effects in this study. In general, RGC RFs are not space-time separable, and it seems that the analyses in the manuscript assume they are.

    Our choice of using flashed images as stimuli with no temporal structure beyond onset and offset and assessing responses via elicited spike counts was motivated by focusing on spatial stimulus integration and minimizing effects of temporal processing. Nonetheless, our extraction of receptive fields from measurements under spatiotemporal white-noise stimulation uses a space-time separation of the spike-triggered average. Thus, the lack of space-time separability of ganglion cell receptive fields can contribute to the putative underestimation of surround components, which we have discussed in the manuscript. In a revised manuscript, we will add an explicit reference to the issue of space-time separability.

    1. This study overlaps significantly with Cao, Merwine and Grzywacs (2011), 'Dependence of retinal Ganglion cell's responses on local textures of natural scenes', Journal of Vision. This article is not cited here, but in my view, the major conclusions are similar.

    Thank you for pointing us to this paper, which is indeed relevant for our work. Both the Cao et al. paper and our manuscript evaluate the effect of spatial contrast in natural images by relating spatial contrast to response deviations from a linear-RF model, albeit with different methods. An important difference, apart from the different species, is that our work then focuses on relating the identified effects of spatial contrast to functional characterizations of the specific nonlinear operations inside the receptive field (e.g. rectification). Furthermore, we also focus on the diversity of spatial-integration properties between cells and cell types, including the description of spatially linear cells and cells that are inversely sensitive to spatial contrast. In a revised manuscript, we will add a comparison to the methods and results from Cao et al.

    1. In my experience, the strength of subunit rectification can be labile during ex vivo experiments. What controls have the author's performed to ensure the effect they are studying remain stable over the duration of their recordings?

    Experimental rundown could, of course, affect subunit rectification as well as other response aspects, such as overall sensitivity. However, we observed that responses for different repeats of the same natural images were typically quite stable over the course of the hour-long stimulus. As also discussed in the response to Reviewer 1, we now analyzed how responses to late trials deviated from responses to early trials and found that only a small subset of cells displayed sizeable drift. Furthermore, those cases were mostly affected by a global drift in response size, keeping the relative responses for different images approximately constant. (For 94% of cells, the correlation of images was larger than 0.7 between average responses for the first five and for the last five trials; approximately on the level of estimated random trial-by-trial variability.) This indicates that the features of stimulus integration did not change substantially over the course of the experiment. In addition, nonlinearities as assessed with our flashed checkerboards were strongly correlated to nonlinearities under natural images, despite the fact that these stimuli were applied 1-2 hours apart. Thus, the strength of subunit rectification appears to be sufficiently stable to allow comparison over different stimuli.

  2. ###Reviewer #3:

    The manuscript by Karamanlis and Gollisch examines the responses of mouse retinal ganglion cells (RGCs) to natural stimuli. The primary conclusion of the manuscript is that spatial integration of stimuli within the receptive field is nonlinear. This nonlinear integration is consistent with "local signal rectification". This results in a set of RGCs that are sensitive to spatial contrast within the RF. The Authors also note the presence of cells that are suppressed by contrast and cells that prefer uniform stimulation of the RF. To reach these conclusions the authors use multi-electrode array recordings from isolated mouse retina. Spatial RFs are estimated using white noise stimuli, which are then used to generate a null-model for linear spatial summation. They compare predictions of this null-model to the responses of the same RGCs to briefly flashed natural images. The authors find some RGCs that are consistent with this null model and many that are not consistent. The authors correlate deviations from linear spatial summation to deviations revealed by contrast reversing gratings. They also used a mixed-contrast, flashed-checkerboard paradigm to map the contrast tuning and rectification of RF subunits. Finally, the authors show that some of these results track with functionally distinct RGC types such as direction-selective and "IRS" RGCs.

    The data and analyses presented in this manuscript are high quality. However, I think the study is largely consistent with many previous studies that demonstrate nonlinear spatial integration among RGCs in the mammalian (including mouse) retina. I think the Authors view the use of natural stimuli as a major departure from previous work, but I'm not convinced of this for two reasons. First, I don't see a compelling reason to think that results using contrast reversing gratings or other 'textured stimuli' (e.g. Schwartz et al Nat Neuro 2012) would fail to generalize to flashed natural scenes. Second, the implicit claim here is that a 200ms flashed natural scene interleaved with an 800ms gray screen is a natural stimulus. I think this assumes a lot about the space-time separability of the RF mechanisms, and these assumptions are not well justified.

    Major Concerns:

    1. I think the introduction of the manuscript is building a straw man argument, suggesting that many (or most) scientists think the retina is predominantly linear. A pubmed search of 'retinal ganglion cell' and 'nonlinear' produced more than 300 studies. Specifying subunit nonlinearity produces 28 studies. The discovery of subunit nonlinearities is roughly 50 years old and many manuscripts demonstrate Y-like receptive fields are more common across RGC types than X-like receptive fields.

    2. The authors seem to be arguing that the spatial nonlinearities engaged by the contrast reversing gratings are not the same as those engaged by their natural scenes (Figure 3). However, I think the authors are assuming too much that the spatial and temporal components of the RFs are separable. The flashed natural scenes are interleaved with relatively long gray screens. The contrast reverse granting are reversed in a square-wave fashion with no interleaved gray screen. These distinct spatiotemporal dynamics in the stimuli seem likely to explain the difference. This would also seem likely to explain why the flashed checkerboards in Figure 4 produced results more correlated to flashed scenes in Figure 1. In summary, I don't see a strong reason to think the authors are observing anything other than subunit rectification of the sort described by Hochstein and Shapley in the 1970s and followed up in many subsequent studies.

    3. It is not clear to this reviewer that flashed natural images interleaved by a gray screen is qualitative more natural than white noise, sinusoidal gratings, or square-wave gratings.

    4. The null-model constructed by the authors in Figure 1 assumes the RF follows a specific functional form (e.g. Gaussian). However, many studies show that individual RFs frequently exhibit strong deviations from a Gaussian RF. To what extent are the deviations from the null model produced by deviations from linear summation or just linear mechanisms that deviate from the specific parametric form imposed by the model?

    5. It was unclear how the authors rule out the contribution of differences in (nonlinear) temporal integration to the effects in this study. In general, RGC RFs are not space-time separable, and it seems that the analyses in the manuscript assume they are.

    6. This study overlaps significantly with Cao, Merwine and Grzywacs (2011), 'Dependence of retinal Ganglion cell's responses on local textures of natural scenes', Journal of Vision. This article is not cited here, but in my view, the major conclusions are similar.

    7. In my experience, the strength of subunit rectification can be labile during ex vivo experiments. What controls have the author's performed to ensure the effect they are studying remain stable over the duration of their recordings?

  3. ###Reviewer #2:

    Summary:

    Understanding how retinal ganglion cells respond to natural stimuli is a central but daunting question, which retinal neurophysiologists have begun to tackle recently. Here Karamanlis and Gollisch perform large-scale multi-electrode recordings in the mouse retina and demonstrate that the responses of many ganglion cells cannot be predicted by standard linear-nonlinear models (L-LN). They go on to test a variety of clever artificial stimuli that emphasize and allow for the quantification of the non-linear aspects of RGCs responses and convincingly demonstrate that non-linear processing is associated with sensitivity to fine spatial contrasts (subunits) and local rectification. While these aspects of RGC receptive fields have been previously described, demonstrating their applicability to natural vision is a significant advancement.

    Major Comments:

    My first main concern is with the way the paper is written. It does not highlight the significant advancements but rather emphasizes what is already known from other studies. For example, many of the conclusions of non-linear spatial integration & signal rectification arising in bipolar cells have been well described previously. By contrast, novel aspects like the sensitivity of reversal gratings being unrelated to LN model performance for natural scenes should be explained more in detail. The authors should more clearly state the major advancements that are being made here beyond what has already been shown previously (e.g. Turner and Rieke, 2016)

    Second, the authors never include non-linear subunits in their model to demonstrate improved performance. Testing models with filters that incorporate rectification and convexity as experimentally determined will enable them to show their utility more convincingly. Without this, the reader is left with the conclusion that there are RGCs that exhibit non-linear or linear spatial integration (already known) and that non-linear integrators cause LN models to perform poorly with natural images (Turner and Rieke, 2016).

    Third, I'm not sure how 'natural' their natural images are, given static images are flashed over the cell intermittently. While such stimuli might simulate some sort of saccadic eye movements, whether this is relevant for mouse vision is not clear. Would linear models be more predictive for responses to natural movies? Some discussion on this issue would be helpful.

  4. ###Reviewer #1:

    This paper investigates how retinal ganglion cells integrate inputs across space, with a focus on natural images. Nonlinear spatial integration is a well-studied property of ganglion cells, but it has been largely characterized using grating stimuli. A few studies have extended this to look at spatial integration in the context of natural images, but we certainly lack a comprehensive treatment of that issue. The current paper has a number of strengths - notably using a number of complementary stimuli and analysis tools to study a large population of ganglion cells and linking properties of responses to artificial stimuli with those to natural stimuli. It also has a few weaknesses (some detailed carefully in the paper) - such as the inability to identify ganglion cell types (aside from a few), and to pinpoint specific circuit mechanisms. These are limitations of the techniques used. This is not a request as much as setting the context of the contribution of the paper. Generally the paper was in good shape, and the data supported the conclusions well. I do think there are a number of issues that could be strengthened. Those are listed below in rough order of importance.

    Statistical correlations in natural scenes:

    A number of analyses in the paper rely on estimating the spatial contrast from an image and comparing the dependence of various measures of the cells' responses on spatial contrast. A danger in this analysis is that spatial contrast is likely correlated with many other statistical properties of the image, so attributing a given response property to spatial contrast has some potential confounds. This issue should be discussed as a possible caveat, unless the authors can rule it out. The paper, accurately, describes the results in terms of correlations (and not causal relationships), but some discussion of the complexity of natural image statistics would be helpful.

    Comparison of grating and natural scene spatial scale:

    The section starting around line 233 was confusing for several reasons. First, this section starts by measuring the spatial scale associated with the grating responses, and then comparing that to LN model performance for natural inputs. It's not clear why the spatial scale is the relevant aspect of the responses to gratings. Indeed, the next paragraph provides a measure of the relative sensitivity of the nonlinear and linear response components (via a comparison of F1 and F2 responses). It would be helpful to include some initial text to motivate the different measures of the grating responses and to anticipate that you will look at both spatial scale and sensitivity. A related issue that bears more directly on the scientific conclusions comes up later in the blurring experiments. The issue is whether it is valid to directly compare the apparent spatial scale of nonlinear responses to images (estimated via blurring) with that of the grating responses. Natural images should have much higher power at low spatial frequencies, and this may strongly impact the spatial scale identified with the blurring experiments.

    Clustering of orientation-selective cells:

    An interesting suggestion in the paper is that the orientation-selective cells can be divided into two groups that differ in their spatial integration properties. Do these groups represent different orientations, as suggested in the text? That seems a simple piece of information to add. Related to this, I would suggest moving Figure S4 into the main text.

    Presentation of checkerboard stimuli and results:

    The checkerboard analysis, particularly how it isolates properties of spatial integration, could get introduced more thoroughly for a reader unfamiliar with it. A related issue is how well the chosen isoresponse contour captures structure in the full distribution of responses. In some cases that looks pretty good, but in others it is less clear. Could you add a supplementary figure or something similar that characterizes how consistent the isoresponse contours are for different response levels?

    Drift in responses over time:

    Some of the rasters - e.g. the bottom left in Figure 1C - show considerable drift over time. It is important that this drift not be interpreted as a failure of the LN model and hence indicative of nonlinear spatial integration. Can you test for drift like this across cells, and exclude any that seem potentially problematic? More generally, some assurance that the variability in the responses for a given generator signal value is real variability across images is needed.

  5. ##Preprint Review

    This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

    ###Summary:

    All of the reviewers expressed concerns about the advance that the work described in the paper represents. These issues were a focus of the consultation among the reviewers. The main concern is that the work needs to go beyond demonstrating that some ganglion cells exhibit nonlinear integration for naturalistic inputs - as that point is quite well established in the literature. The comparison between natural stimuli and gratings could help in this regard, but several issues confound that comparison (e.g. differences in dynamics of the two types of stimuli). These concerns are detailed in the individual reviews below.