Alternation emerges as a multi-modal strategy for turbulent odor navigation

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This work provides an insightful analysis of how animals can use different types of sniffing to quickly find the sources of odorants in natural, often turbulent, environments. As it turns out, the air near the ground is less turbulent but does not provide high precision information about the location of sources that are far away. To get that kind of information, animals have to pause and sniff in the air. Authors show that the relative balance between sniffing near the ground and in the air shifts as the animals approach the source and that this shift matches optimal strategies that can be pursued based on partially observable statistical models of the environment. The paper also includes a very useful set of simulations of odorant flow in the presence of obstacles that will be made publicly available.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Foraging mammals exhibit a familiar yet poorly characterized phenomenon, ‘alternation’, a pause to sniff in the air preceded by the animal rearing on its hind legs or raising its head. Rodents spontaneously alternate in the presence of airflow, suggesting that alternation serves an important role during plume-tracking. To test this hypothesis, we combine fully resolved simulations of turbulent odor transport and Bellman optimization methods for decision-making under partial observability. We show that an agent trained to minimize search time in a realistic odor plume exhibits extensive alternation together with the characteristic cast-and-surge behavior observed in insects. Alternation is linked with casting and occurs more frequently far downwind of the source, where the likelihood of detecting airborne cues is higher relative to ground cues. Casting and alternation emerge as complementary tools for effective exploration with sparse cues. A model based on marginal value theory captures the interplay between casting, surging, and alternation.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    I find the question relevant, the quantitative analysis carefully reasoned, and the results compelling and of broad interest. The authors should address the following comments, which mostly center around clarifying the assumptions made regarding the agents' prior knowledge, and the need for better placing this study within the context of previous research, especially regarding memory requirements of the strategy and comparison with more reactive (memory-less) strategies. Finally, a broader discussion of the limitations of the current study (e.g. what happens if x_thr and y_thr change over time?) and of the next steps would strengthen the paper.

    An assumption behind the entire study is that agents can hold in memory their belief, which in this case is their location relative to the expected location of the source. Over time this memory enables agents that start with a wide prior to refining their belief. This strong assumption makes the strategy discussed here quite different from other more reactive strategies proposed in the literature that do not require agents to build an internal map of the expected location of the source. While it is easy for a robot to maintain such a memory, how and to what extent animals do so using known mechanisms such as path integration and/or systems such as grid and place cells is less clear. A more explicit description of the key memory requirements of the strategy discussed here (once learned) and a discussion of how it might be implemented by animals, as well as a discussion of the differences in that aspect with other strategies proposed in the literature, including reactive strategies, would strengthen the paper and significantly broaden is significance.

    Along the same lines, the study assumes that the agent stores an internal model of the statistics of the plume, e.g. x_thr and y_thr, L_y etc. The predictions made in 6e/f, for example, are likely only valid if the agent already knows the constraints of the plume it is searching for (i.e. x_thr and y_thr), which seems unlikely in most natural scenarios. Perhaps the authors could discuss some ways in which these might be inferred. The authors nicely show that an agent trained with the Poisson model navigates well even in the full time-dependent simulation. But what is missing is a discussion of how animals would get trained in the first place and what information they would need access to in order to do so. Perhaps examine how an agent trained in environment A performs in environment B as a function of how strong the statistical difference between environment A and B are. One could for example change the Poisson statistics between A and B.

    Following the reviewer’s suggestion, we have added an entire new paragraph in the final discussion section (lines 430-455). There, we comment on reactive vs cognitive strategies of search and we provide details on memory requirements of the algorithm presented here and its robustness to misrepresentations of the environmental flow. In particular, Supplementary Figure 1 includes a violin plot showing that when the number of training episodes is low, the performance is bimodal (B) and how the number of alphavectors increases when the number of training episodes increases.

    Supplementary Figure 4 shows the performance of the algorithm varying the model of the environment as suggested by the reviewer, i.e. the agent is trained in environment A and performs its search in environment B. The plot reports performance as a function of the difference between environments A and B.

    Related to the previous point: the simulated plume is straight, i.e. there is no variation in the mean flow and therefore no random meandering of the plume. This means that once the walker hits the center of the plume, if it orients upwind, it is likely to reach the source because there is a continuous stream of odor on the ground it can follow, with just a few castings whenever it drifts slightly off the centerline. Is there a way for the authors to explore what would happen in the case of meandering plumes without having to run another massive simulation? Perhaps a simplified model of odor plume could be used or one could even just use the same simulated plume Poisson statistics but translate this solution perpendicular to the main flow at a slow oscillatory rate. Will the navigator now stop and sniff in the air more often? Will these sniffing events coincide with moments when the navigator loses the plume? Will agents be able to still use a constant x_thr and y_thr or would they have to learn their statics? Or will agents revert to a more memoryless or hybrid strategy?

    Following the reviewer’s suggestion, we have added a plot in Supplementary Figure 4 that shows the performance of our algorithm when trained in a fixed mean flow and searching in a meandering flow where the direction of the mean flow changes with time. The results confirm the robustness of the algorithm with respect to incorrect modeling of the environmental flow. Behavior of the agent in these more challenging conditions is largely consistent with what observed for the static plume. Performance degrades when meandering is more accentuated. We expect training over unsteady conditions will become necessary in even more extreme oscillatory conditions.

    How does the benefit of sniffing the ground vs the air change if odor molecules adsorb and de-adsorb on the surface, thus increasing the distance from the source where ground odor can be detected?

    The issue of adsorption was discussed in the bulk of the paper and we have now added a comment in the final discussion, so as to increase its visibility. Comments are found at lines 395-401.

    There is a difference in clarity between the first part of the paper and the second part that starts at line 232 with the section "Searching for airborne cues". I recommend the authors work on that second section to improve clarity. For example, the goal of that section is not immediately clear. The first paragraph talks about expanding on the intuition gained from the first part and "to address the search dynamics" but does not spell out what key question about search dynamics is to be addressed. This only becomes clear at line 260. Knowing where this is going would help readers understand the motivation behind the simplified model. Maybe lines 258-263 or something similar could be moved into the first paragraph of that section. Also related to the previous comments it would be helpful to clearly state what is assumed known by the agent and what is not. Is the agent assumed to have learned the values of x_thr, v and N in equation (2) before starting the search? As we progress through that section, important details start to be omitted and making it more difficult to follow. For example, what is the definition of t_sniff (I am guessing it is given in line 313?)? What is meant by optimization depth (line 316)? What is meant by episode index, is this referring to N (line 322)? Can the authors provide intuition about why the optimized casting strategy expands over time rather than starting wide right away (line 315)?

    The section "Searching for airborne cues" has been significantly revised for clarity throughout. Specific points raised by the reviewer are addressed:

    • we spelled out the key questions in the first paragraph as suggested (lines245-249)

    • assumptions about the agent's model and reward structure (lines 250264).

    • the definition of t_sniff (line 327), optimization depth (lines 327-330, 339) and episode index (line 345).

    • the intuition behind the casting strategy (lines 331-337)

  2. Evaluation Summary:

    This work provides an insightful analysis of how animals can use different types of sniffing to quickly find the sources of odorants in natural, often turbulent, environments. As it turns out, the air near the ground is less turbulent but does not provide high precision information about the location of sources that are far away. To get that kind of information, animals have to pause and sniff in the air. Authors show that the relative balance between sniffing near the ground and in the air shifts as the animals approach the source and that this shift matches optimal strategies that can be pursued based on partially observable statistical models of the environment. The paper also includes a very useful set of simulations of odorant flow in the presence of obstacles that will be made publicly available.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  3. Reviewer #1 (Public Review):

    To address this question, the authors combine fully resolved fluid mechanics numerical simulations of an odor plume with the framework of partially observed Markov decision process (POMDP) - a framework for devising the optimal decision-making policy that an autonomous agent should use in order to achieve its goal given that it only has partial access to information to guide its decisions. The main result is that while stopping to sniff in the air bears the cost of halting progression towards the source - animals tend to stop moving to sniff in the air - this is offset by the benefit of being able to detect odor packets at a larger distance from the source (odors travel a shorter distance near the ground). Interestingly, sniffing in the air takes place more often when the agent loses the plume and tends to coincide with periods when the agent casts crosswind (a known strategy used by animals to regain contact with the plume) while sniffing near the ground is preferred when the agents are within the plume.

    In a second part of the paper, the authors concentrate on the search dynamics far from the source where ground cues tend to be absent. Combining analytical calculations and a simplified POMDP (for this part they ignore ground sniffing) they ask: 1) how wide should the agent cast? 2) how long should the agent spend casting before surging upwind? 3) where should the agent sniff during the casting phase? Here the main results are that surge length and cast width should equal the detection range x_thr and the prior width of the plume L_y respectively. They also find that the optimal time to spent casting obeys the marginal value theory, i.e. it is the time at which the marginal value of staying in a cast equals that of surging and exploring a new yet unexplored patch in the agent's belief of its own position relative to the source. These results provide a rationale for the observed alternation between ground and air sniffing, and for casting and how the timing between these events should be selected.

    I find the question relevant, the quantitative analysis carefully reasoned, and the results compelling and of broad interest. The authors should address the following comments, which mostly center around clarifying the assumptions made regarding the agents' prior knowledge, and the need for better placing this study within the context of previous research, especially regarding memory requirements of the strategy and comparison with more reactive (memory-less) strategies. Finally, a broader discussion of the limitations of the current study (e.g. what happens if x_thr and y_thr change over time?) and of the next steps would strengthen the paper.

    An assumption behind the entire study is that agents can hold in memory their belief, which in this case is their location relative to the expected location of the source. Over time this memory enables agents that start with a wide prior to refining their belief. This strong assumption makes the strategy discussed here quite different from other more reactive strategies proposed in the literature that do not require agents to build an internal map of the expected location of the source. While it is easy for a robot to maintain such a memory, how and to what extent animals do so using known mechanisms such as path integration and/or systems such as grid and place cells is less clear. A more explicit description of the key memory requirements of the strategy discussed here (once learned) and a discussion of how it might be implemented by animals, as well as a discussion of the differences in that aspect with other strategies proposed in the literature, including reactive strategies, would strengthen the paper and significantly broaden is significance.

    Along the same lines, the study assumes that the agent stores an internal model of the statistics of the plume, e.g. x_thr and y_thr, L_y etc. The predictions made in 6e/f, for example, are likely only valid if the agent already knows the constraints of the plume it is searching for (i.e. x_thr and y_thr), which seems unlikely in most natural scenarios. Perhaps the authors could discuss some ways in which these might be inferred. The authors nicely show that an agent trained with the Poisson model navigates well even in the full time-dependent simulation. But what is missing is a discussion of how animals would get trained in the first place and what information they would need access to in order to do so. Perhaps examine how an agent trained in environment A performs in environment B as a function of how strong the statistical difference between environment A and B are. One could for example change the Poisson statistics between A and B.

    Related to the previous point: the simulated plume is straight, i.e. there is no variation in the mean flow and therefore no random meandering of the plume. This means that once the walker hits the center of the plume, if it orients upwind, it is likely to reach the source because there is a continuous stream of odor on the ground it can follow, with just a few castings whenever it drifts slightly off the centerline. Is there a way for the authors to explore what would happen in the case of meandering plumes without having to run another massive simulation? Perhaps a simplified model of odor plume could be used or one could even just use the same simulated plume Poisson statistics but translate this solution perpendicular to the main flow at a slow oscillatory rate. Will the navigator now stop and sniff in the air more often? Will these sniffing events coincide with moments when the navigator loses the plume? Will agents be able to still use a constant x_thr and y_thr or would they have to learn their statics? Or will agents revert to a more memoryless or hybrid strategy?

    How does the benefit of sniffing the ground vs the air change if odor molecules adsorb and de-adsorb on the surface, thus increasing the distance from the source where ground odor can be detected?

    There is a difference in clarity between the first part of the paper and the second part that starts at line 232 with the section "Searching for airborne cues". I recommend the authors work on that second section to improve clarity. For example, the goal of that section is not immediately clear. The first paragraph talks about expanding on the intuition gained from the first part and "to address the search dynamics" but does not spell out what key question about search dynamics is to be addressed. This only becomes clear at line 260. Knowing where this is going would help readers understand the motivation behind the simplified model. Maybe lines 258-263 or something similar could be moved into the first paragraph of that section. Also related to the previous comments it would be helpful to clearly state what is assumed known by the agent and what is not. Is the agent assumed to have learned the values of x_thr, v and N in equation (2) before starting the search?

    As we progress through that section, important details start to be omitted and making it more difficult to follow. For example, what is the definition of t_sniff (I am guessing it is given in line 313?)? What is meant by optimization depth (line 316)? What is meant by episode index, is this referring to N (line 322)? Can the authors provide intuition about why the optimized casting strategy expands over time rather than starting wide right away (line 315)?

  4. Reviewer #2 (Public Review):

    This manuscript describes how animals can more accurately find targets using olfactory cues by alternating sniffing close to the ground and sniffing up in the air. Near the ground, the air is less turbulent but contains signals of a smaller magnitude. High in the air, the signals propagate further but are more intermittent. The authors perform large-scale simulations of odor concentrations taking into account wind, turbulence, and the impact of the boundary layer near the ground. The authors then use these simulations results to find the optimal sequence of decisions about moving forward, sniffing near the ground, or up in the air. The decisions are made using partially observable Markov decision processes.

    The simulations of odor distributions will provide a useful contribution to the field independent of the conclusions on optimal search strategies