A visual sense of number emerges from divisive normalization in a simple center-surround convolutional network

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The current manuscript presents a computational model of numerosity estimation. The model relies on center-surround contrast filters at different spatial scales with divisive normalization between their responses. Using dot arrays as visual stimuli, it is shown that the summed normalized responses of the filters are sensitive to numerosity and insensitive to the low-level visual features of dot size and spacing. Importantly, the model provides an explanation of various spatial and temporal illusions in visual numerosity perception.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Many species of animals exhibit an intuitive sense of number, suggesting a fundamental neural mechanism for representing numerosity in a visual scene. Recent empirical studies demonstrate that early feedforward visual responses are sensitive to numerosity of a dot array but substantially less so to continuous dimensions orthogonal to numerosity, such as size and spacing of the dots. However, the mechanisms that extract numerosity are unknown. Here, we identified the core neurocomputational principles underlying these effects: (1) center-surround contrast filters; (2) at different spatial scales; with (3) divisive normalization across network units. In an untrained computational model, these principles eliminated sensitivity to size and spacing, making numerosity the main determinant of the neuronal response magnitude. Moreover, a model implementation of these principles explained both well-known and relatively novel illusions of numerosity perception across space and time. This supports the conclusion that the neural structures and feedforward processes that encode numerosity naturally produce visual illusions of numerosity. Taken together, these results identify a set of neurocomputational properties that gives rise to the ubiquity of the number sense in the animal kingdom.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    The manuscript is clear and well-written and provides a novel and interesting explanation of different illusions in visual numerosity perception. However, the model used in the manuscript is very similar to Dehaene and Changeux (1993) and the manuscript does not clearly identify novel computational principles underlying the number sense, as the title would suggest. Thus, while we were all enthusiastic about the topic and the overall findings, the paper currently reads as a bit of a replication of the influential Dehaene & Changeux (1993)-model, and the authors need to do more to compare/contrast to bring out the main results that they think are novel.

    Major concerns:

    1. The model presented in the current manuscript is very similar to the Dehaene and Changeux 1993 model. The main difference is in the implementation of lateral inhibition in the DoG layer where the 1993 model used a recurrent implementation, and the current model uses divisive normalization (see minor concern #1). The lateral inhibition was also identified as a critical component of numerosity estimation in the 1993 model, so the novelty in elucidating the computational principles underlying the number sense in the current manuscript is not evident.

    If the authors hypothesize that the particular implementation of lateral inhibition used here is more relevant and critical for the number sense than the forms used in previous work (e.g., the recurrent implementation of the 1993 model or the local response normalization of the more recent models), then a direct comparison of the effects of the different forms is necessary to show this. If not, then the focus of the manuscript should be shifted (e.g., changing the title) to the novel aspects of the manuscript such as the use of the model to explain various visual illusions and adaptation and context effects.

    Thank you for bringing up these issues. We acknowledge that there was a lack of clear explanations for the key differences between the proposed model and that of Dehaene & Changeux (hereafter D&C). Please see our revisions below where we: 1) explain the D&C model and its limitations in more in detail; 2) our critical changes to the D&C model; and 3) how those critical changes allow a novel way to explain numerosity perception.

    The paragraph in the Introduction where we first introduce D&C is modified to read:

    “The computational model of Dehaene and Changeux (1993) explains numerosity detection based on several neurocomputational principles. That model (hereafter D&C) assumes a one-dimensional linear retina (each dot is a line segment), and responses are normalized across dot size via a convolution layer that represents combinations of two attributes: 1) dot size, as captured by difference-of-Gaussian contrast filters of different widths; and 2) location, by centering filters at different positions. In the convolution layer, the filter that matches the size of each dot dominates the neuronal activity at the location of the dot owing to a winner-take-all lateral inhibition process. To indicate numerosity, a summation layer pools the total activity over all the units in the convolution layer. While the D&C model provided a proof of concept for numerosity detection, it has several limitations as outlined in the discussion. Of these, the most notable is that strong winner-take-all in the convolution layer discretizes visual information (e.g., discrete locations and discrete sizes yielding a literal count of dots), which is implausible for early vision. As a result, the output of the model is completely insensitive to anything other than number in all situations, which is inconsistent with empirical data (Park et al., 2021).”

    The revised Discussion describes our critical modifications to D&C and their consequences.

    “At first blush, the current model might be considered an extension of Dehaene and Changeux (1993). However, there are four ways in which the current model differs qualitatively from the D&C model. First, the D&C model is one-dimensional, simulating a linear retina, whereas we model a two-dimensional retina feeding into center-surround filters, allowing application to the two-dimensional images used in numerosity experiments (Fig. 1A). Second, extreme winner-take-all normalization in the convolution layer of the D&C model implausibly limits visual precision by discretizing the visual response. For example, the convolution layer in the D&C model only knows which of 9 possible sizes and 50 possible locations occurred. In contrast, by using divisive normalization in the current model, each dot produces activity at many locations and many filter sizes despite normalization, and a population could be used to determine exact location and size. Third, extreme winner-take-all normalization also eliminates all information other than dot size and location. By using divisive normalization, the current model represents other attributes such edges and groupings of dots (Fig. 1B) and these other attributes provide a different explanation of number sensitivity as compared to D&C. For example, the D&C model as applied to the spacing effect between two small dots (Fig. 4A) would represent the dots as existing discretely at two close locations versus two far locations, with the total summed response being two in either case. In contrast, the current model gives the same total response for a different reason. Although the small filters are less active for closely spaced dots, the closely spaced dots look like a group as captured by a larger filter, with this addition for the larger filter offsetting the loss for the smaller filter. Similarly, as applied to the dot size effect (Fig. 4B), the D&C model would only represent the larger dots using larger filters. In contrast, the current model represents larger dots with larger filters and with smaller filters that capture the edges of the larger dots, and yet the summed response remains the same in each case owing to divisive normalization (again, there are offsetting factors across different filter sizes). The final difference is that the D&C model does not include temporal normalization, which we show to be critical for explaining adaptation and context effects.”

    In sum, the current model explains a wider range of effects by using representations and processes that more closely reflect early vision. The change to two-dimensions allows application to real images. The inclusion of temporal normalization allows application to temporal effects. The change from winner-take-all to divisive normalization might appear to be a parameter setting, but it’s one that produces qualitatively different results and explanations (e.g., representations of edges and groupings that are part of the explanation of selective sensitivity to number). These behaviors are consistent with empirical data and are qualitatively different from that of the D&C model. Now that we’ve highlighted the ways in which this model differs qualitatively from the D&C model, we hope that our original title still works.

    Reviewer #2 (Public Review):

    This is a very interesting and novel model of numerosity perception, based on known computational principles of the visual system: center-surround mechanisms at various scales, combined with divisive normalization (over space and time). The model explains, at least qualitatively, several of the important aspects of numerosity perception.

    Firstly, the model makes major and minor predictions. Major: the effect of adaptation, at least 30%, as well as impendence of several densities and dot size; minor: tiny effects like irregularity, around 6%. I think it would make sense to separate these. To my knowledge, it is the first to account for adaptation, which was the major effect that brought numerosity into the realm of psychophysics: and it explains it effortlessly, using an intrinsic component of the model (divisive normalization), not with an ad-hoc add-on. This should be highlighted more. And perhaps, the fit can be more quantitative. Murphy and Burr (who they cite) showed that the adaptation is rapid. How does this fit the model? Very well, I would have thought.

    Thanks for the positive evaluation of our work. In the revised manuscript, we followed the reviewer’s suggestion to highlight the novelty of the model in its explanation of numerosity adaptation. As the reviewer says, one significant aspect of our work is that the model can explain a relatively large effect of numerosity adaptation with minimal effort. To be clear, even though we call it “numerosity” adaptation, the model does not know number in any explicit way. One way to highlight this aspect, we thought, is to compare the current adaptation results to a simulation where the adaptor and target are defined along the dimensions of size or spacing. In such cases (which are now reported in Fig. S6 and S7), no reliable under- or over-estimation was observed. These results suggest that numerosity adaptation is a natural byproduct of divisive normalization working across space and time.

    The question about the rapidity of adaptation is indeed an interesting one. However, the current model is not designed to simulate the effect of exposure duration on neural activity. More specifically, the current model operates across trials and stimuli (e.g., one response per stimulus), using a single parameter that captures the temporal gradient of divisive normalization from prior trials (e.g., the influence of two trials ago as compared to one trial ago). As currently formulated, the model does not address adaptation at the level of milliseconds, as would be necessary to model adaptor duration. To model adaptation at the millisecond level requires a dynamic model that not only specifies the rate of adaptation but also the rate of recovery from adaptation, such as in the visual orientation adaptation model of Jacob, Potter, and Huber (2021), which includes the dynamics of synaptic depression and synaptic recovery. In future work we hope to make such modifications to the model to expand the range of explained effects. Nevertheless, a dynamic version of the model should encompass this simpler trial-by-trial version of the model as a special case. Our goal in this study was a clear demonstration of the neural mechanisms underlying numerosity in early vision and so we have attempted to keep the model as simple as possible while still capturing neural behavior.

    We have elected not to fit data and instead we explored the behavior model in a qualitative way, asking whether the commonly observed numerosity effects emerge from the model in the qualitatively correct direction regardless of its parameter values (e.g., as reported in Fig S2). This approach follows from our central aim, which is to explain the neurocomputational principles of the number sense rather than produce a detailed model with specific parameters values fit to data. Our aim was to show that the correct qualitative behaviors naturally emerge from these principles without requiring specific parameter values (and more importantly, to show how these behaviors emerge from these principles).

    Jacob, L. P., Potter, K. W., & Huber, D. E. (2021). A neural habituation account of the negative compatibility effect. Journal of Experimental Psychology: General, 150(12), 2567.

    Among the tiny predicted effects (visually indistinguishable bar graphs) is the connectedness effect. But this is in fact large, up to 20%. I would say they fail here, by predicting only 6%. And I would say this is to be expected, as the illusion relies on higher-order properties (grouping), which would not immediately result from normalization. Furthermore, the illusion varies with individual personality traits (Pomè et al, JAD, 2021). The fact that it works with very thin lines suggests that it is not the physical energy of the lines that normalizes, but the perceptual grouping effect. I would either drop it, or give it as an example of where the predictions are in the right direction, but clearly fall short quantitatively. No shame in saying that they cannot explain everything with low-level mechanisms. A future revised model could incorporate grouping phenomena.

    Thank you for the suggestion. We agree that trying to explain the connectedness illusion with center-surround filters is not ideal. As the reviewer says, the main driver of the connectedness illusion is likely to be groupings of dots. The current model captures groupings of dots, but it does so in a circularly symmetric way, which is not ideal for capturing the oblong groupings (barbells) that are likely to play a role in the connectedness illusion. It is probably because of this mismatch (between the shape of the groupings and shape of the filters) that the model produces a smaller magnitude connectedness illusion. If the model included a subsequent convolution layer in which the filters were oriented lines of different sizes, it would likely produce a larger connectedness illusion. Following the reviewer’s suggestion, we have placed the connectedness illusion in the supplementary materials and only refer to this in the future directions section of the discussion, writing:

    “Another line of possible future work concerns divisive normalization in higher cortical levels involving neurons with more complex receptive fields. While the current normalization model with center-surround filters successfully explained visual illusions caused by regularity, grouping, and heterogeneity, other numerosity phenomena such as topological invariants and statistical pairing (He et al., 2015; Zhao and Yu, 2016) may require the action of neurons with receptive fields that are more complex than center-surround filters. For example, another well-known visual illusion is the effect of connectedness, whereby an array with dots connected pairwise with thin lines is underestimated (by up to 20%) compared to the same array without the lines connected (Franconeri et al., 2009). This underestimation effect likely arises from barbell-shaped pairwise groupings of dots, rather than the circularly symmetric groupings of dots that are captured with center-surround filters. Nonetheless, a small magnitude (6%) connectedness illusion emerges with center-surround filters (Fig. S10). Augmenting the current model with a subsequent convolution layer containing oriented line filters and oriented normalization neighborhoods of different sizes might increase the predicted magnitude of the illusion.”

    In short, I like the model very much, but think the manuscript could be packaged better. Bring out the large effects more, especially those that have never been explained previously (like adaptation). And try to be more quantitative.

    Thank you. We now highlight the novel computational demonstrations of adaptation to a greater degree and—as also suggested by Reviewer 1—provide more quantitative reports of the illusory effects that the model naturally produces.

  2. Evaluation Summary:

    The current manuscript presents a computational model of numerosity estimation. The model relies on center-surround contrast filters at different spatial scales with divisive normalization between their responses. Using dot arrays as visual stimuli, it is shown that the summed normalized responses of the filters are sensitive to numerosity and insensitive to the low-level visual features of dot size and spacing. Importantly, the model provides an explanation of various spatial and temporal illusions in visual numerosity perception.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    The manuscript is clear and well-written and provides a novel and interesting explanation of different illusions in visual numerosity perception. However, the model used in the manuscript is very similar to Dehaene and Changeux (1993) and the manuscript does not clearly identify novel computational principles underlying the number sense, as the title would suggest. Thus, while we were all enthusiastic about the topic and the overall findings, the paper currently reads as a bit of a replication of the influential Dehaene & Changeux (1993)-model, and the authors need to do more to compare/contrast to bring out the main results that they think are novel.

    Major concerns:
    1. The model presented in the current manuscript is very similar to the Dehaene and Changeux 1993 model. The main difference is in the implementation of lateral inhibition in the DoG layer where the 1993 model used a recurrent implementation, and the current model uses divisive normalization (see minor concern #1). The lateral inhibition was also identified as a critical component of numerosity estimation in the 1993 model, so the novelty in elucidating the computational principles underlying the number sense in the current manuscript is not evident.

    If the authors hypothesize that the particular implementation of lateral inhibition used here is more relevant and critical for the number sense than the forms used in previous work (e.g., the recurrent implementation of the 1993 model or the local response normalization of the more recent models), then a direct comparison of the effects of the different forms is necessary to show this. If not, then the focus of the manuscript should be shifted (e.g., changing the title) to the novel aspects of the manuscript such as the use of the model to explain various visual illusions and adaptation and context effects.

  4. Reviewer #2 (Public Review):

    This is a very interesting and novel model of numerosity perception, based on known computational principles of the visual system: center-surround mechanisms at various scales, combined with divisive normalization (over space and time). The model explains, at least qualitatively, several of the important aspects of numerosity perception.

    Firstly, the model makes major and minor predictions. Major: the effect of adaptation, at least 30%, as well as impendence of several densities and dot size; minor: tiny effects like irregularity, around 6%. I think it would make sense to separate these. To my knowledge, it is the first to account for adaptation, which was the major effect that brought numerosity into the realm of psychophysics: and it explains it effortlessly, using an intrinsic component of the model (divisive normalization), not with an ad-hoc add-on. This should be highlighted more. And perhaps, the fit can be more quantitative. Murphy and Burr (who they cite) showed that the adaptation is rapid. How does this fit the model? Very well, I would have thought.

    Among the tiny predicted effects (visually indistinguishable bar graphs) is the connectedness effect. But this is in fact large, up to 20%. I would say they fail here, by predicting only 6%. And I would say this is to be expected, as the illusion relies on higher-order properties (grouping), which would not immediately result from normalization. Furthermore, the illusion varies with individual personality traits (Pomè et al, JAD, 2021). The fact that it works with very thin lines suggests that it is not the physical energy of the lines that normalizes, but the perceptual grouping effect. I would either drop it, or give it as an example of where the predictions are in the right direction, but clearly fall short quantitatively. No shame in saying that they cannot explain everything with low-level mechanisms. A future revised model could incorporate grouping phenomena.

    In short, I like the model very much, but think the manuscript could be packaged better. Bring out the large effects more, especially those that have never been explained previously (like adaptation). And try to be more quantitative.