Collaborative hunting in artificial agents with deep reinforcement learning

Kazushi Tsutsui
Ryoya Tanaka
Kazuya Takeda
Keisuke Fujii

Curated by eLife

eLife assessment

In this study, deep learning methods are deployed in the context of a group hunting scenario wherein two predators pursue a single prey. Through deep learning, the two predators achieve higher predation success than occurs with single predators. Much of the evidence in this important study is solid, with implications for future work on the ethology and simulation of cooperative behaviors.

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (eLife)

Abstract

Collaborative hunting, in which predators play different and complementary roles to capture prey, has been traditionally believed to be an advanced hunting strategy requiring large brains that involve high-level cognition. However, recent findings that collaborative hunting has also been documented in smaller-brained vertebrates have placed this previous belief under strain. Here, using computational multi-agent simulations based on deep reinforcement learning, we demonstrate that decisions underlying collaborative hunts do not necessarily rely on sophisticated cognitive processes. We found that apparently elaborate coordination can be achieved through a relatively simple decision process of mapping between states and actions related to distance-dependent internal representations formed by prior experience. Furthermore, we confirmed that this decision rule of predators is robust against unknown prey controlled by humans. Our computational ecological results emphasize that collaborative hunting can emerge in various intra- and inter-specific interactions in nature, and provide insights into the evolution of sociality.

Version published to 10.7554/elife.85694 on eLife
May 7, 2024
eLife
Dec 17, 2023

eLife assessment

In this study, deep learning methods are deployed in the context of a group hunting scenario wherein two predators pursue a single prey. Through deep learning, the two predators achieve higher predation success than occurs with single predators. Much of the evidence in this important study is solid, with implications for future work on the ethology and simulation of cooperative behaviors.

Read the original source
eLife
Dec 17, 2023

Reviewer #1 (Public Review):

Predator-prey interactions often involve one predator and one prey. Where more than one predator hunts a single prey, a key question is whether the predators involved are cooperating in some manner. Where this has been observed in biology, it has been suggested that complex cognitive processes may be needed to support the cooperation, such as each predator representing the intention of other predators. In this study the authors ask whether cooperation can emerge in a highly idealized scenario with little more than a basic reinforcement learning approach. Due to the size of the resulting state space, computing the value function becomes computationally cumbersome, so a function approximation method using a variant of a deep Q-network (DQN) is used. The authors have successfully shown that cooperation, here …

Reviewer #1 (Public Review):

Predator-prey interactions often involve one predator and one prey. Where more than one predator hunts a single prey, a key question is whether the predators involved are cooperating in some manner. Where this has been observed in biology, it has been suggested that complex cognitive processes may be needed to support the cooperation, such as each predator representing the intention of other predators. In this study the authors ask whether cooperation can emerge in a highly idealized scenario with little more than a basic reinforcement learning approach. Due to the size of the resulting state space, computing the value function becomes computationally cumbersome, so a function approximation method using a variant of a deep Q-network (DQN) is used. The authors have successfully shown that cooperation, here operationalized as a higher success rate with two predators in the context of sharing of the reward (prey that's captured), can emerge in this context. Further, they show that a cluster-based analysis of the DQN can guide the generation of a short description length rule-based approach that they also test and show qualitative agreement with the original DQN results.

Strengths of the work include providing a demonstration proof that cooperation can emerge with simple rules in a predator-prey context, suggesting that its emergence over phylogenetic time within certain clades of animals may not require the complex cognitive processes that prior work has suggested may be needed. Given the simplicity of the rules, one possible outcome could be a widening of investigation into cooperative hunting beyond the usual small number of species in which this has been observed, such as chimpanzees, seals, dolphins, whales, wild dogs, and big cats. The authors have done well to show how, with a variety of adjustments, a DQN network can be used to gain insights into a complex ethological phenomena.

One weakness of the work is the simplicity of the environment, a 2D plane that is 10 body lengths in each dimension, with full observability and no limitations to movement besides the boundaries of the space. Recent literature suggests that more complex phenomena such as planning may only evolve in the context of partial observability in predator-prey interactions. Thus the absence of more advanced tactics on the part of the predator agents may reflect limitations due to the simplicity of the behavioral arena, or limitations of associative learning alone to drive the emergence of these tactics. Another is that although correlations between network activity are discussed, and used to generate a rule-based approach that succeeds in replicating some of the results, there is no further analysis that may go beyond correlation to a causal analysis.

Read the original source
eLife
Dec 17, 2023

Reviewer #2 (Public Review):

This paper demonstrates that model-free reinforcement learning, with relatively small networks, is sufficient to observe collaborative hunting in predator prey environments. The paper then studies the conditions under which collaborative hunting emerges (namely, difficulty of hunting and sharing of the spoils) which is an interesting question to study and the paper contains a fascinating study in which a human is tasked with controlling the prey. However, the simplicity of the environment, a 2-d particle world with simple dynamics, makes it unclear how generalizable the results are and the results rely heavily on visual interpretation of t-SNE plots rather than more direct metrics.

Strengths:
- The distinct behaviors uncovered between the predators in shared vs. not-shared reward are quite interesting!
- The …

Reviewer #2 (Public Review):

This paper demonstrates that model-free reinforcement learning, with relatively small networks, is sufficient to observe collaborative hunting in predator prey environments. The paper then studies the conditions under which collaborative hunting emerges (namely, difficulty of hunting and sharing of the spoils) which is an interesting question to study and the paper contains a fascinating study in which a human is tasked with controlling the prey. However, the simplicity of the environment, a 2-d particle world with simple dynamics, makes it unclear how generalizable the results are and the results rely heavily on visual interpretation of t-SNE plots rather than more direct metrics.

Strengths:
- The distinct behaviors uncovered between the predators in shared vs. not-shared reward are quite interesting!
- The realization that the ability of deep RL models to solve predator-prey problems has implication for models of what is needed for collaborative hunting is clever.

Weaknesses:
- The paper seems to make a claim that since this problem is solvable with model-free learning or a model-free decision tree, complicated cognition is not needed for collaborative hunting. However, the settings under which this hunting is done is exceedingly simple and it is possible that in more complex settings such as more partially observable settings or settings where the capabilities of the partners are unknown then more complicated forms of cognition might still be needed.
- The problem is fully observed (I think), so there may be one uniquely good strategy that the predators can use that will work successfully against all prey. If this is the case, the human studies are of limited value, they are just confirming that the problem has a near-deterministic solution on the part of the predators.

Read the original source
eLife
Dec 17, 2023

Reviewer #3 (Public Review):

This paper aims to understand the nature of collaborative hunting. It sets out by first defining simple conditions under which collaborative hunting emerges, which leads to the emergence of a toy environment. The environment itself is simple, K prey chasing a single predator with no occlusions. I find this a little strange, since it was my understanding that collaborative hunting emerges in part because the presence of occlusions allows for more complex strategies that require planning.

That being said, I do think the environment is sufficient for this paper, and I quite enjoyed using it to run some toy experiments. It reminds me of some of the simpler environments from Petting Zoo, a library for multi-agent learning in reinforcement learning.

Once a simple environment was established, the authors fit a …

Reviewer #3 (Public Review):

This paper aims to understand the nature of collaborative hunting. It sets out by first defining simple conditions under which collaborative hunting emerges, which leads to the emergence of a toy environment. The environment itself is simple, K prey chasing a single predator with no occlusions. I find this a little strange, since it was my understanding that collaborative hunting emerges in part because the presence of occlusions allows for more complex strategies that require planning.

That being said, I do think the environment is sufficient for this paper, and I quite enjoyed using it to run some toy experiments. It reminds me of some of the simpler environments from Petting Zoo, a library for multi-agent learning in reinforcement learning.

Once a simple environment was established, the authors fit a reinforcement learning model to the environment. In this case, the model is Q-learning. The predator and prey are treated as separate agents in the environment, each with their own independent Q functions. Each agent gets full observability of the surroundings. As far as I understand, the predators do not share an action space, and so they can only collaborate implicitly by inferring the actions of the other predators. However, there is a version of these experiments wherein the reward function is shared, all agents receiving a 1 when the prey is caught. One limitation of the current work is that it does not consider reinforcement learning methods methods wherein a value function is shared. This is a current dominant strategy in multi-agent RL. See for example OpenAI Five and also Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Missing these algorithms limits the scope of the work.

Having fit an RL model, the next order of business is to try and search for internal representations in the agent's model that correspond to collaboration. The author's use t-SNE embeddings of the agents last hidden layers in the policy network.

Analyzing these embeddings in Figure 3, we see that there are some representations that correspond to specific types of collaborative behavior, which indicates that the model is indeed learning to encode collaboration. I should note that this is not surprising from an RL perspective. Certainly, we are aware that Multi-Agent actor critic methods can exhibit cooperative behavior. See Emergent tool use from multi-agent interaction where agents jointly learn to push a table together. It is true that earlier work didn't specifically identify the units responsible for this behavior, and I think this work should be lauded for the novelty it brings to this discussion.

A large underlying point of this paper seems to be that we we need to consider these simple toy environments where we can easily train Q-learning, because it is impossible to analyze the behaviors that emerge from real animal behavior. See lines 187-189. This makes sense on the surface, because there are no policy weights in the case of real-world behavior. However, it is unfortunately misleading. It is entirely possible to take existing animal behavior, fit a linear model (or a deep net) to this behavior, and then do t-SNE on this fit model. This is referred to as behavioral cloning. What's more, offline RL makes it entirely possible to fit a Q-function to animal behaviors, in which case the exact same t-SNE analysis can be carried out without ever running Q-learning in the environment. From my perspective, the fact that RL is not needed to carry out the paper's main analysis is the biggest weakness of the paper.

Meanwhile, I do think the comparisons with human players was exceptionally interesting, and I'm glad it was included in this work.

Finally, I would like to speak to the reinforcement learning sections of this paper, as this is my personal area of expertise. I will note that the RL used in this paper is all valid and correct. The descriptions of Q-learning and its modifications are technically accurate. It's worth noting that the performance offered by the Q-learning methods in this paper is not particularly close to optimal. I mean this in two ways. First, cooperative RL methods are much more performant. Second, the Q-learning implementation considered by the author's is far below state of the art standards.

I will also note that, from the perspective of RL, there is no novelty in the paper. Indeed, many Deep Mind papers, including the original Q-learning paper, have similar t-SNE embeddings to try and understand the action space. And works such as Sentiment Neuron and Visualizing and Understanding Recurrent Networks, among many many others, have focused on the problem of understanding the correspondence between network weights and behaviors. Thus, the novelty must come from a biological perspective. Or perhaps from a perspective at the intersection of biology and RL. I do believe this is an area worth further studying.

Read the original source
Version published to 10.1101/2022.10.10.511517v1 on bioRxiv
Oct 11, 2022

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed