Shallow neural networks trained to detect collisions recover features of visual loom-selective neurons

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper uses an anatomically-constrained neural network model to investigate how looming visual stimuli - i.e. stimuli likely to collide with an organism - could be detected. The authors find one dominant solution to this problem reproduces both the computational properties and neural responses of known collision detecting neurons in the fruit fly, Drosophila melanogaster, without ever being trained on neural data. Their findings shed light on why biological collision detection circuits may have converged on particular solutions. A similar approach could reveal important computational features in other circuits.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Animals have evolved sophisticated visual circuits to solve a vital inference problem: detecting whether or not a visual signal corresponds to an object on a collision course. Such events are detected by specific circuits sensitive to visual looming, or objects increasing in size. Various computational models have been developed for these circuits, but how the collision-detection inference problem itself shapes the computational structures of these circuits remains unknown. Here, inspired by the distinctive structures of LPLC2 neurons in the visual system of Drosophila , we build anatomically-constrained shallow neural network models and train them to identify visual signals that correspond to impending collisions. Surprisingly, the optimization arrives at two distinct, opposing solutions, only one of which matches the actual dendritic weighting of LPLC2 neurons. Both solutions can solve the inference problem with high accuracy when the population size is large enough. The LPLC2-like solutions reproduces experimentally observed LPLC2 neuron responses for many stimuli, and reproduces canonical tuning of loom sensitive neurons, even though the models are never trained on neural data. Thus, LPLC2 neuron properties and tuning are predicted by optimizing an anatomically-constrained neural network to detect impending collisions. More generally, these results illustrate how optimizing inference tasks that are important for an animal’s perceptual goals can reveal and explain computational properties of specific sensory neurons.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This paper investigates what functional properties emerge from training an anatomically-constrained neural network on a specific computational task-detection of looming visual stimuli. Several functional models are identified by optimizing a network model for this task, and one of these models matches several properties observed in the fly neurons that perform the task. The approach and results are interesting. I did feel that several aspects of the work could be described more clearly, and that the potential of the model to reveal important aspects of the computation could be probed more thoroughly.

    Inhibitory component of model. The interplay between excitatory and inhibitory components of the model could be explored in more detail. A specific aspect that is interesting is the inclusion of rectification in the inhibitory circuit. Rectification is motivated by the extra neuron in the circuit proving inhibition (lines 155-157), but it is not clear why an additional neuron would require rectification. Are their physiological measurements that indicate that the extra neuron introduces rectification, or is that a speculation? Exploring whether rectification is important would also be interesting - e.g. by removing it from the trained models, and/or training models on circuits in which rectification is absent. Lines 360-362 mention interesting response properties created by inhibition, but do not define what those are. Including some of these extensions of the basic model could highlight the potential of the model to make predictions about specific circuit features that are important for detection of looming stimuli.

    Thanks for this interesting comment. Please see the revision summary and Essential Revisions 4 for more information. For lines 360-362 in the first submission, we referred to the Fig. 10E, F, where some examples of the peripheral inhibition are shown.

    Intuition for second model class. One of the key results in the paper is the existence of two classes of solution to the optimization problem - one of which follows the expectation for a detector based on outward optical flow, and the other of which does not. It is important to explain intuitively how the ”inward” model is able to detect looming stimuli, given that it seems sensitive to the wrong optical flow features. This should be early - e.g. around lines 214-216.

    We agree. In the revised manuscript, we have modified some expressions around (former) lines 214-216 to say explicitly that the inward solutions are sensitive to hit stimuli coming from the side of the receptive field rather than the center.

    In general, the results would benefit from developing some arguments in more detail. One example is the paragraph on lines 232-237. The differences in performance in Figures 8C and D stick out to me as a reader, but I am not guided through those differences in the text. Intuition for why you see the change in relative performance of the two solution (lines 266-268) would similarly be helpful. Another example is lines 290-292. These are several examples in which more explanation would be helpful, but you could look at the results in general with this in mind.

    These are good suggestions. First, We added more explanations about the differences shown in the Figure 8C. Figure 8D is basically a re-plot of the red and orange curves in Figure 8C, and is to show the distance dependency of the miss signals. Second, the relative changes in the performance of the two solutions appear due to the fact that the ROC and PR curves are bounded from above and the loss function is bounded from below (by 0). The better-performing solution (inward in this case) in general has less space to improve compared with the other one. Third, we moved the comments about the angular-size encoder in the discussion section to the results section after the sentences starting in (former) lines 290-292.

    The performance of the two classes of solution becomes more similar as the number of neurons increases. A concern is that this reflects saturation of performance rather than actual equivalence of the models. Can you make the task harder, e.g. by adding distracting optical flow? That might help separate performance of the different models and avoid saturation.

    It is correct that the tasks are relatively easy for our model, and both outward and inward models with large enough population size can almost perfectly distinguish hit cases from others. In this revision, we engineered a new set of stimuli with rotational background flows. In this case, both inward and outward solutions are found, and the outward solutions tend to perform better than the inward ones. Though this particular choice of more difficult task seems to favor outward solutions, we find it difficult to interpret, for lack of experimental comparisons. Instead, in the discussion, we interpret this result to show the potentially strong dependence of the solution on the statistics of loom stimuli, which requires characterizing. For more details, see Essential Revisions 3.

    Figure 10: how did you chose the specific outward solution used in this figure? More generally, some measure of the similarity of model components with experiment across all outward models is important. Currently the text reads as if you chose one of many models that happened to have components that looked like those measured. This comes up again on lines 310-311 and 313-315.

    We have answered this question in the section of Essential Revisions 5. With the new, simpler model, it is no longer necessary to pick from among the distribution of solutions.

    Are there animals that detect looming stimuli with fewer loom detectors? If so it would be interesting to see if they have adopted a similar or different computation.

    This is a very interesting question. However, the authors are not sure about the number of loom detectors in other animals, and also not aware of the existence of the inward solutions in either flies or other animals. One related point to note is that the LPLC2 neuron and its computational structure are not the only way to detect looming events, and there are other loom sensitive neurons and neural circuits that receive very different types of visual signals, such as LC4 in flies and LGMD in locust, which do not appear to receive directional inputs.

    Reviewer #2 (Public Review):

    The manuscript from Zhou et al. investigates how certain features of looming-detecting neurons can arise from optimizing a shallow neural network to detect imminent collisions. The authors consider architectures that resemble the known anatomy of LPLC2 neurons in Drosophila, with excitatory inputs from the four layers of motion detectors in the lobula plate and inhibitory inputs from the interneurons in those layers. The authors find that some fraction of the trained networks exhibit tuning properties of LPLC2 neurons, including (a) similar response profiles to stimuli that are not present in the training data; (b) similar dependence on the angular size of the looming object as opposed to angular velocity; and (c) similar dependence between peak response time and the ratio of size to speed of the looming object. The authors also find another solution among the trained networks that is very different from the biological circuits. However, they show that this other solution becomes less common as the number of neurons grows, which is the relevant regime for the biological circuit. This paper adds to a body of work that suggests that the structural or functional properties of brain circuits are the solution to an optimization problem implied by the task that they have to perform – in this case, the ability to detect looming motion.

    The conclusions of the paper seem well supported within the class of models that was considered. The choice of class is, however, rather narrow and could be better explained and analyzed.

    1. One potentially confusing aspect of the work is that there are in fact three major types of solutions that are found, not only two as described in the abstract: apart from ”outward” (similar to LPLC2) and ”inward” (dissimilar to LPLC2) there are also ”unstructured” solutions that, as far as I understand, basically fail to perform the task – although their performance isn’t adequately discussed. The authors comment on this in the Discussion, suggesting that the unstructured networks are local optima where the stochastic gradient descent algorithm they use for optimization gets stuck. They argue that evolutionary processes would be unlikely to linger there, implying that it might be fine to ignore these solutions. While reasonable, this claim is difficult to assess without more discussion of these results. These solutions are not a rare occurrence: according to the Methods, over half of the trained networks end up in the ”unstructured” pile.

    In our initial submission, the term ’unstructured solution’ was an unfortunate name to use for these solutions. In this revision, we call them ’zero solutions’, since all the elements in the filters are zero (or very close to zero). Please see Essential Revision 2 for a more detailed answer to this comment.

    1. The stimuli used in the paper are very simple: basically rigid, featureless objects moving in a straight line and at constant velocity, or rotating at constant angular velocity. Naturalistic stimuli are likely to be much more complex, which could hurt the training process. This is only briefly touched upon in the Discussion, leaving open the question of how the results of this work would change in more natural settings.

    This is an interesting point. Please see Essential Revision 3 for our responses and changes.

    1. The authors impose a 90-degree rotation symmetry as well as a reflection symmetry on the connection weights to the four layers of motion detectors that are sensitive to the four cardinal directions. Given that the training data that is used also has these symmetries, the question arises whether imposing these symmetries by hand was necessary. This is unfortunately not discussed in the paper.

    The imposed symmetries are not strictly necessary. Please see Essential Revisions 1 for details about how we have addressed this comment.

    1. One highly confusing aspect is that there is, in fact, an additional symmetry: the same filters are used for all the subunits. The difference between the different subunits seems to be only in the inputs that they receive – i.e., that they are responding to different parts of the visual field. This is only really apparent from the Methods. Given again the rotational symmetry of the inputs, it would be reasonable to assume that this symmetry could be learned, but this isn’t discussed or explained properly.

    Yes, we agree that this symmetry could be learned, but this requires a lot more training data, which is not practical in terms of computational cost. In addition, this imposed across-unit symmetry makes different models with different M’s have the same number of parameters, which is a nice property to have when studying how the population size affects the model performance and trained filters.

    1. The authors say that the ”outward” model reproduces biology but I’m not sure that the details of the lobula plate circuitry match this claim. For instance, LPi neurons typically have broad arbors, making location specific inhibitory inputs unlikely. And is there evidence that the inhibitory inputs are limited to a small region, like in the model?

    The LPi neurons seem to be similar in size to the LPLC2 dendrites in the lobular plate (Klapoetke et al. (Nature, 2017), Figure 5K and Extended Data Figure 9). In our outward models (both linear receptive and rectified inhibition), the inhibitory components are larger than the excitatory components when the number of units is large, which is at least consistent with potentially larger pooling of inhibitory signals than excitatory ones. Please refer to Essential Revision 4.

    1. Why not test the predictions of the model by analyzing the inputs onto the LPLC2 neurons using connectomics datasets?

    We would have loved to do this. Regrettably, the hemibrain dataset lopped off virtually all of the lobula plate. Our response to Essential Revision 1 expands a bit more on this point.

    Reviewer #3 (Public Review):

    Although collision detecting neurons have been identified across animals, the computations they perform remain unresolved. Here, Zhou et. al train artificial neural networks to predict collisions across a diverse set of visual stimuli and constrain network geometry using the known anatomy of a Drosophila looming detector cell type, LPLC2. Zhou et al demonstrate that trained networks converge upon three solution types: an unstructured solution, a solution where inward motion is excitatory, and a solution where outward motion is excitatory. Interestingly, the solution excited by outward motion is also inhibited by inward motion as predicted for LPLC2 computations, and the output of these trained networks is highly similar to measured LPLC2 responses across stimuli.

    1.Strengths: a. The novelty of this study is that the networks are trained to solve a problem(collision detection) instead of being trained on neural data, but as a result are able to reproduce neural data. b. The authors investigate how collision detection solutions change when looming is computed by a single neuron versus a population of neurons. This is particularly interesting because looming detectors have been identified at both population and single neuron levels. These results shed light on why many different collision detection computations have been proposed across neurons and across species, as they may face different anatomical constraints. The results also provide novel computations that can be further investigated in vivo. c. The manuscript is well written, the figures are clear, and the movies are very helpful in understanding the approach and the results.

    2.Limitations: a. The findings could be strengthened by a more thorough characterization across the different solutions. For example, only two of many outward solutions are compared to actual neural data, and there is no explanation for why these two solutions were selected and whether they are representative of the entire category of outward solutions. There is also no metric for evaluating how well these solutions match the neural data.

    For a more detailed response to this comment, please see Essential Revisions 5. In particular, our focus on the linear receptive field model has eliminated this issue with the distribution of solutions in the main presentation of the results. We believe this is overall less confusing than the prior presentation of the more complicated rectified inhibition model.

    b. The inward solutions are dropped from the last section of the paper; however, it would be very interesting to see the output of example inward solutions in comparison to actual neural data.

    Please see Essential Revisions 2. We have added the inward solutions to Figure 10 in the supplemental figures.

    c. Within outward solutions, there are cases where inward inhibition is completely absent which does not follow what is known about LPLC2. The authors should mention this and also provide a comparison between outward solutions with or without inhibition.

    With the simpler, linear RF model, these are no longer the focus of the study. They do still exist in the rectified inhibition model solutions, which have substantial variability.

  2. Evaluation Summary:

    This paper uses an anatomically-constrained neural network model to investigate how looming visual stimuli - i.e. stimuli likely to collide with an organism - could be detected. The authors find one dominant solution to this problem reproduces both the computational properties and neural responses of known collision detecting neurons in the fruit fly, Drosophila melanogaster, without ever being trained on neural data. Their findings shed light on why biological collision detection circuits may have converged on particular solutions. A similar approach could reveal important computational features in other circuits.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    This paper investigates what functional properties emerge from training an anatomically-constrained neural network on a specific computational task - detection of looming visual stimuli. Several functional models are identified by optimizing a network model for this task, and one of these models matches several properties observed in the fly neurons that perform the task. The approach and results are interesting. I did feel that several aspects of the work could be described more clearly, and that the potential of the model to reveal important aspects of the computation could be probed more thoroughly.

    Inhibitory component of model. The interplay between excitatory and inhibitory components of the model could be explored in more detail. A specific aspect that is interesting is the inclusion of rectification in the inhibitory circuit. Rectification is motivated by the extra neuron in the circuit proving inhibition (lines 155-157), but it is not clear why an additional neuron would require rectification. Are their physiological measurements that indicate that the extra neuron introduces rectification, or is that a speculation? Exploring whether rectification is important would also be interesting - e.g. by removing it from the trained models, and/or training models on circuits in which rectification is absent. Lines 360-362 mention interesting response properties created by inhibition, but do not define what those are. Including some of these extensions of the basic model could highlight the potential of the model to make predictions about specific circuit features that are important for detection of looming stimuli.

    Intuition for second model class. One of the key results in the paper is the existence of two classes of solution to the optimization problem - one of which follows the expectation for a detector based on outward optical flow, and the other of which does not. It is important to explain intuitively how the "inward" model is able to detect looming stimuli, given that it seems sensitive to the wrong optical flow features. This should be early - e.g. around lines 214-216.

    In general, the results would benefit from developing some arguments in more detail. One example is the paragraph on lines 232-237. The differences in performance in Figures 8C and D stick out to me as a reader, but I am not guided through those differences in the text. Intuition for why you see the change in relative performance of the two solution (lines 266-268) would similarly be helpful. Another example is lines 290-292. These are several examples in which more explanation would be helpful, but you could look at the results in general with this in mind.

    The performance of the two classes of solution becomes more similar as the number of neurons increases. A concern is that this reflects saturation of performance rather than actual equivalence of the models. Can you make the task harder, e.g. by adding distracting optical flow? That might help separate performance of the different models and avoid saturation.

    Figure 10: how did you chose the specific outward solution used in this figure? More generally, some measure of the similarity of model components with experiment across all outward models is important. Currently the text reads as if you chose one of many models that happened to have components that looked like those measured. This comes up again on lines 310-311 and 313-315.

    Are there animals that detect looming stimuli with fewer loom detectors? If so it would be interesting to see if they have adopted a similar or different computation.

  4. Reviewer #2 (Public Review):

    The manuscript from Zhou et al. investigates how certain features of looming-detecting neurons can arise from optimizing a shallow neural network to detect imminent collisions. The authors consider architectures that resemble the known anatomy of LPLC2 neurons in Drosophila, with excitatory inputs from the four layers of motion detectors in the lobula plate and inhibitory inputs from the interneurons in those layers. The authors find that some fraction of the trained networks exhibit tuning properties of LPLC2 neurons, including (a) similar response profiles to stimuli that are not present in the training data; (b) similar dependence on the angular size of the looming object as opposed to angular velocity; and (c) similar dependence between peak response time and the ratio of size to speed of the looming object. The authors also find another solution among the trained networks that is very different from the biological circuits. However, they show that this other solution becomes less common as the number of neurons grows, which is the relevant regime for the biological circuit. This paper adds to a body of work that suggests that the structural or functional properties of brain circuits are the solution to an optimization problem implied by the task that they have to perform -- in this case, the ability to detect looming motion.

    The conclusions of the paper seem well supported within the class of models that was considered. The choice of class is, however, rather narrow and could be better explained and analyzed.

    1. One potentially confusing aspect of the work is that there are in fact three major types of solutions that are found, not only two as described in the abstract: apart from "outward" (similar to LPLC2) and "inward" (dissimilar to LPLC2) there are also "unstructured" solutions that, as far as I understand, basically fail to perform the task -- although their performance isn't adequately discussed. The authors comment on this in the Discussion, suggesting that the unstructured networks are local optima where the stochastic gradient descent algorithm they use for optimization gets stuck. They argue that evolutionary processes would be unlikely to linger there, implying that it might be fine to ignore these solutions. While reasonable, this claim is difficult to assess without more discussion of these results. These solutions are not a rare occurrence: according to the Methods, over half of the trained networks end up in the "unstructured" pile.
    2. The stimuli used in the paper are very simple: basically rigid, featureless objects moving in a straight line and at constant velocity, or rotating at constant angular velocity. Naturalistic stimuli are likely to be much more complex, which could hurt the training process. This is only briefly touched upon in the Discussion, leaving open the question of how the results of this work would change in more natural settings.

    3. The authors impose a 90-degree rotation symmetry as well as a reflection symmetry on the connection weights to the four layers of motion detectors that are sensitive to the four cardinal directions. Given that the training data that is used also has these symmetries, the question arises whether imposing these symmetries by hand was necessary. This is unfortunately not discussed in the paper.

    4. One highly confusing aspect is that there is, in fact, an additional symmetry: the same filters are used for all the subunits. The difference between the different subunits seems to be only in the inputs that they receive -- i.e., that they are responding to different parts of the visual field. This is only really apparent from the Methods. Given again the rotational symmetry of the inputs, it would be reasonable to assume that this symmetry could be learned, but this isn't discussed or explained properly.

    5. The authors say that the "outward" model reproduces biology but I'm not sure that the details of the lobula plate circuitry match this claim. For instance, LPi neurons typically have broad arbors, making location specific inhibitory inputs unlikely. And is there evidence that the inhibitory inputs are limited to a small region, like in the model?

    6. Why not test the predictions of the model by analyzing the inputs onto the LPLC2 neurons using connectomics datasets?

  5. Reviewer #3 (Public Review):

    Although collision detecting neurons have been identified across animals, the computations they perform remain unresolved. Here, Zhou et. al train artificial neural networks to predict collisions across a diverse set of visual stimuli and constrain network geometry using the known anatomy of a Drosophila looming detector cell type, LPLC2. Zhou et al demonstrate that trained networks converge upon three solution types: an unstructured solution, a solution where inward motion is excitatory, and a solution where outward motion is excitatory. Interestingly, the solution excited by outward motion is also inhibited by inward motion as predicted for LPLC2 computations, and the output of these trained networks is highly similar to measured LPLC2 responses across stimuli.

    1. Strengths:
    a. The novelty of this study is that the networks are trained to solve a problem (collision detection) instead of being trained on neural data, but as a result are able to reproduce neural data.
    b. The authors investigate how collision detection solutions change when looming is computed by a single neuron versus a population of neurons. This is particularly interesting because looming detectors have been identified at both population and single neuron levels. These results shed light on why many different collision detection computations have been proposed across neurons and across species, as they may face different anatomical constraints. The results also provide novel computations that can be further investigated in vivo.
    c. The manuscript is well written, the figures are clear, and the movies are very helpful in understanding the approach and the results.

    2. Limitations:
    a. The findings could be strengthened by a more thorough characterization across the different solutions. For example, only two of many outward solutions are compared to actual neural data, and there is no explanation for why these two solutions were selected and whether they are representative of the entire category of outward solutions. There is also no metric for evaluating how well these solutions match the neural data.
    b. The inward solutions are dropped from the last section of the paper; however, it would be very interesting to see the output of example inward solutions in comparison to actual neural data.
    c. Within outward solutions, there are cases where inward inhibition is completely absent which does not follow what is known about LPLC2. The authors should mention this and also provide a comparison between outward solutions with or without inhibition.