Food-washing monkeys recognize the law of diminishing returns

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This is a valuable study that tests the functional role of food-washing behavior in removing tooth-damaging sand and grit in long-tailed macaques and whether dominance rank predicts level of investment in the behavior. The evidence that food-washing is deliberate is compelling and the evidence that individual investment in the behavior varies is solid. Overall, the paper should be of interest to researchers interested in foraging behavior, cognition, and primate evolution.

This article has been Reviewed by the following groups

Read the full article

Abstract

Few animals have the cognitive faculties or prehensile abilities needed to eliminate tooth-damaging grit from food surfaces. Some populations of monkeys wash sand from foods when standing water is readily accessible, but this propensity varies within groups for reasons unknown. Spontaneous food-washing emerged recently in a group of long-tailed macaques ( Macaca fascicularis ) inhabiting Koram Island, Thailand, and it motivated us to explore the factors that drive individual variability. We measured the mineral and physical properties of contaminant sands and conducted a field experiment, eliciting 1,282 food-handling bouts by 42 monkeys. Our results verify two long-standing presumptions, that monkeys have a strong aversion to sand and that removing it is intentional. Reinforcing this result, we found that monkeys clean foods beyond the point of diminishing returns, a suboptimal behavior that varied with social rank. Dominant monkeys abstained from washing, a choice consistent with the impulses of dominant monkeys elsewhere: to prioritize rapid food intake and greater reproductive fitness over the long-term benefits of prolonging tooth function.

Article activity feed

  1. eLife Assessment

    This is a valuable study that tests the functional role of food-washing behavior in removing tooth-damaging sand and grit in long-tailed macaques and whether dominance rank predicts level of investment in the behavior. The evidence that food-washing is deliberate is compelling and the evidence that individual investment in the behavior varies is solid. Overall, the paper should be of interest to researchers interested in foraging behavior, cognition, and primate evolution.

  2. Reviewer #1 (Public review):

    In this paper, the authors had 2 aims:

    (1) Measure macaques' aversion to sand and see if its' removal is intentional, as it likely in an unpleasurable sensation that causes tooth damage.

    (2) Show that or see if monkeys engage in suboptimal behavior by cleaning foods beyond the point of diminishing returns, and see if this was related to individual traits such as sex and rank, and behavioral technique.

    They attempted to achieve these aims through a combination of geochemical analysis of sand, field experiments, and comparing predictions to an analytical model.

    The authors' conclusions were that they verified a long-standing assumption that monkeys have an aversion to sand as it contains many potentially damaging fine grained silicates, and that removing it via brushing or washing is intentional.

    They also concluded that monkeys will clean food for longer than is necessary, i.e. beyond the point of diminishing returns, and that this is rank-dependent.

    High and low-ranking monkeys tended not to wash their food, but instead over-brushed it, potentially to minimize handling time and maximize caloric intake, despite the long-term cumulative costs of sand.

    This was interpreted through the *disposable soma hypothesis*, where dominants maximize immediate needs to maintain rank and increase reproductive success at the potential expense of long-term health and survival.

    Strengths:

    The field experiment seemed well designed, and their quantification of the physical and mineral properties of quartz particles (relative to human detection thresholds) seemed good relative to their feret diameter and particle circularity (to a reviewer that is not an expert in sand). The *Rank Determination* and *Measuring Sand* sections were clear.

    In achieving Aim 1, the authors validated a commonly interpreted, but unmeasured function, of macaque and primate behavior-- a key study/finding in primate food processing and cultural transmission research.

    I commend their approach in trying to develop a quantitative model to generate predictions to compare to empirical data for their second aim.
    This is something others should strive for.

    I really appreciated the historical context of this paper in the introduction and found it very enjoyable and easy to read.

    I do think that interpreting these results in the context of the *disposable soma hypothesis* and the potential implications in the *paleolithic matters* section about interpreting dental wear in the fossil record are worthwhile.

  3. Author response:

    The following is the authors’ response to the previous reviews

    We thank the editors and Reviewers 1 and 3 for their though6ul consideration of our manuscript. The present revision is submitted to address comments raised concerning rank determinations and the following sentence in the editorial assessment:

    The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank, including the fitness-relevance and ultimate evolutionary implications of the findings, is incomplete given limitations of the experimental design.

    Close reading of this sentence reveals two parallel threads. The first can be read as “…evidence for variable rank is incomplete given the limitations of the experimental design,” whereas the second can be read as “…evidence for adaptive investment and fitness is incomplete given the limitations of the experimental design.” The first alludes to a critique of our methods, while the second alludes to points of discussion unrelated to our experimental design. Unpacking this sentence is important because it casts the totality of our paper as “incomplete,” a word of consequence for early-career scholars because it prevents indexing in Web of Science.

    For clarity, we will refer to these topics as Thread 1 and Thread 2 in the following response.

    Thread 1 seems rooted in a comment made by Reviewer 1, which is reproduced below:

    I am still struck that there was an analysis of only trials where <3 individuals are present. If rank was important, I would imagine that behavior might be different in social contexts when theA, scrounging, policing, aggression, or other distractions might occur-- where rank would have effects on foraging behavior. Maybe lower rankers prioritize rapid food intake then. If rank should be related to investment in this behavior, we might expect this to be magnified (or different) in social contexts where it would affect foraging. It might just be that the data was too hard to score or process in those settings, or the analysis was limited. Additionally, I think that more robust metrics of rank from more densely sampled focal follow data would be a beJer measure, but I acknowledge the limitations in getting the ideal. Since rank is central to the interpretation of these results, I think that reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.

    We are grateful for this perspective of Reviewer 1, but it puts us in an uncomfortable position. We must respond rather forcefully because of its influence on the above assessment. A problem with R1’s comment is that it uses the word “foraging” (a behavior we did not study) instead of “cleaning” (the behavior we did study). Still, we can substitute the latter word with the former to get the gist of it.

    R1 criticizes our methods as a prelude for imagining the behaviors of our study animals, a form of conjecture. R1 correctly supposes a positive relationship between the number of animals and the intensity of competition for a limited food resource, a well-known phenomenon; and, yes, the food in each trial was decidedly limited, being fixed at nine cucumber slices. But R1 incorrectly presumes rank effects on cleaning under conditions of intense food competition. When the number of monkeys participating in a trial exceeded the number of feeding stations (n = 3), we saw little or no cleaning effort, either brushing or washing. So, rank effects on cleaning are immaterial under these conditions. As our study goals were narrowly focused on detecting individual propensities, or choices, as a function of rank, we limited our analysis to trials involving three monkeys or fewer. In retrospect, we admit that we should have provided better justification for our choice of trials, so we’ve edited one of our sentences:

    Original sentence

    Formerly lines 219-220: To minimize the potential confounding effects of dominance interactions, we analyzed trials with ≤ 3 monkeys.

    Revised sentence

    Current lines 219-224: We excluded trials from analysis if the number of participating monkeys exceeded the number of feeding stations, as these conditions produced high levels of feeding competition with scant cleaning behavior. Such conditions effectively erased individual variation in sand removal, the topic motivating our experiment. Accordingly, we analyzed trials with ≤ 3 monkeys, putting 937 food-handling bouts into the GLMM statistical models, which included data on individual rank, sex, and sand treatment.

    R1’s final criticism – “I think that more robust metrics of rank from more densely sampled focal follow data would be a better measure, but I acknowledge the limitations in getting the ideal” – seems to imply that rank data were collected during our experiment. On the contrary, we determined ranks from five years of focal follows preceding the experiment, achieving the very standard that R1 describes as ideal. The relevant text appeared on lines 165-169 in version 2.0:

    To determine the rank-order of adults, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). In some cases, these data were supplemented with ad libitum observations. This protocol existed during five years (2013-2018) of continual observations before we conducted our experiment in July-August 2018.

    Naturally, we were puzzled by R1’s dismissal of our methods, as well as R1’s conclusion, reached without evidence, that “[the] reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.” It is unsubstantiated assertation with no definition of robustness, making it difficult for anyone to objectively assess the quality of our data.

    We detect in R1’s words some unfamiliarity with the social organization of our study species, which is fair enough. To better orient readers to the dominance hierarchy of Macaca fascicularis, and to boost reader confidence in the volume and quality of our rank data, we have added several sentences to this section of the manuscript, lines 169-183:

    Macaques form multi-male multi-female (polygynandrous) social groups with individual dominance hierarchies. In M. fascicularis, the hierarchy is strictly linear and extremely steep, meaning aggression is unidirectional (de Waal, 1977; van Noordwijk and van Schaik, 2001) with profound asymmetries in outcomes for individuals of adjacent ranks (Balasubramaniam et al., 2012). Further, the dominance hierarchies of philopatric females are stable and predictable. Daughters follow the pattern of youngest ascendancy, ranking just below their mothers with few known exceptions among older sisters (de Waal, 1977; van Noordwijk and van Schaik, 1999). Taken together, these species traits are conducive to unequivocal rank determinations.

    To determine the rank-order of adults in our study group, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5-min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). These data were supplemented with ad libitum observations and all rank determinations were updated monthly, and when males immigrated or emigrated. This protocol predates our experiment in July-August 2018, representing 970 hr of focal data during five years of systematic study (2013-2018).

    Thread 2 criticizes our evidence for adaptive investment and fitness, describing it is a limitation of our experimental design. Accordingly, the totality of our experiment was classified as “incomplete.” Yet, our experiment was never designed to collect such evidence, and we make no claims of having it. Rather, we discussed potential fitness consequences to highlight the broader significance of our study, connecting it diverse bodies of literature, from evolutionary theory to paleoanthropology. Our intent was to follow the conventions of scientific writing; to put our results into conversation with the wider literature and set an agenda for future research.

    On reflection, Thread 2 seems to pivot around something as arbitrary as structure. Previously, our results and discussion were combined under a single section header (“Results and Discussion”), a stylistic choice to economize words. Our manuscript is a Short Report, which is limited to 1,500 words of main text. But this level of concision proved counterproductive. It blurred our results and discussion in the minds of readers. Indeed, Reviewer 3 described it as “misleading,” a barbed word that accomplishes the same act attributed to us. To counter this perspective, we have simply partitioned our Results (now “Experimental Results”) and Discussion to draw a sharper distinction between the two components of our paper.

  4. eLife Assessment

    This is a valuable study that tests the functional role of food-washing behavior in removing tooth-damaging sand and grit in long-tailed macaques and whether dominance rank predicts level of investment in the behavior. The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank, including the fitness-relevance and ultimate evolutionary implications of the findings, is incomplete given limitations of the experimental design. Overall, the paper should be of interest to researchers interested in foraging behavior, cognition, and primate evolution.

  5. Reviewer #1 (Public review):

    In this paper, the authors had 2 aims:

    (1) Measure macaques' aversion to sand and see if its' removal is intentional, as it likely in an unpleasurable sensation that causes tooth damage.

    (2) Show that or see if monkeys engage in suboptimal behavior by cleaning foods beyond the point of diminishing returns, and see if this was related to individual traits such as sex and rank, and behavioral technique.

    They attempted to achieve these aims through a combination of geochemical analysis of sand, field experiments, and comparing predictions to an analytical model.

    The authors' conclusions were that they verified a long-standing assumption that monkeys have an aversion to sand as it contains many potentially damaging fine grained silicates, and that removing it via brushing or washing is intentional.

    They also concluded that monkeys will clean food for longer than is necessary, i.e. beyond the point of diminishing returns, and that this is rank-dependent.

    High and low-ranking monkeys tended not to wash their food, but instead over-brushed it, potentially to minimize handling time and maximize caloric intake, despite the long-term cumulative costs of sand.

    This was interpreted through the *disposable soma hypothesis*, where dominants maximize immediate needs to maintain rank and increase reproductive success at the potential expense of long-term health and survival.

    # Strengths

    The field experiment seemed well designed, and their quantification of the physical and mineral properties of quartz particles (relative to human detection thresholds) seemed good relative to their feret diameter and particle circularity (to a reviewer that is not an expert in sand). The *Rank Determination* and *Measuring Sand* sections were clear.

    In achieving Aim 1, the authors validated a commonly interpreted, but unmeasured function, of macaque and primate behavior-- a key study/finding in primate food processing and cultural transmission research.

    I commend their approach in trying to develop a quantitative model to generate predictions to compare to empirical data for their second aim.
    This is something others should strive for.

    I really appreciated the historical context of this paper in the introduction and found it very enjoyable and easy to read.

    I do think that interpreting these results in the context of the *disposable soma hypothesis* and the potential implications in the *paleolithic matters* section about interpreting dental wear in the fossil record are worthwhile.

    # Weaknesses

    Several of my concerns in an earlier review were addressed in revision, which I appreciate. One thing I think could strengthen this paper is a clearer link to social foraging theory to explore heterogeneity in handling times (as the currency they are trying to maximize).

    I am satisfied with the improvements in statistics and that I can access the code and data.

    I am still struck that there was an analysis of only trials where <3 individuals are present. If rank was important, I would imagine that behavior might be different in social contexts when theft, scrounging, policing, aggression, or other distractions might occur-- where rank would have effects on foraging behavior. Maybe lower rankers prioritize rapid food intake then. If rank should be related to investment in this behavior, we might expect this to be magnified (or different) in social contexts where it would affect foraging. It might just be that the data was too hard to score or process in those settings, or the analysis was limited. Additionally, I think that more robust metrics of rank from more densely sampled focal follow data would be a better measure, but I acknowledge the limitations in getting the ideal . Since rank is central to the interpretation of these results, I think that reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.

    While some of the boxes about raccoons and Concorde Fallacy were interesting, they did feel like a bit of a distraction from the main message in the paper.

  6. Reviewer #3 (Public review):

    This revised paper provides evidence that food washing and brushing in wild long-tailed macaques are deliberate behaviors to remove sand that can damage tooth enamel. The demonstration of the immediate functional importance of these behaviors is nicely done, and there is some interesting initial evidence that macaques differ systematically in their investment in food cleaning based on dominance rank.

    The authors interpret this evidence as support for "disposable soma" effects: that reduced time and effort invested food washing in high-ranking individuals is attributable to prioritizing reproductive effort. Given that the analysis is on a single group with no longitudinal data, there are no fitness measures or fitness proxies, the energetic constraints faced by this population are not clear, and both sexes are combined into a single dominance hierarchy (trade-offs between different forms of investment are typically thought to differ between sexes), this conclusion is premature, although an interesting foundation for future studies.

    More generally, the results directly supported by the data collection and analysis (grit on Koshima likely damages macaque teeth; processing food helps mitigate the damage; there is some interesting interindividual variation in food processing time, and that time is not always in line with what appears to be optimal) tend to be combined with interpretation that is much more speculative (e.g., the effect sizes observed are consequential for fitness; high-ranking animals are making choices that optimize their long-term fitness at the expense of their soma). This is in part a stylistic choice but can have the effect of drawing attention away from the stronger empirical findings and/or be misleading. Similarly, although I appreciate that the authors were trying to interpret and respond to previous feedback from reviewers, I found the addition of the box text on the raccoon nomenclature and on irrational behavior and the Concorde effect distracting (more intro-textbook style than journal article style).

  7. Author response:

    The following is the authors’ response to the original reviews.

    We thank the reviewers for their constructive criticism. It is rare and gratifying to receive such thoughtful feedback, and the result is a much stronger paper. We made significant changes to our statistical analyses and figures to better differentiate the effects of sex and dominance rank on food-cleaning behaviors. These revisions uphold our original conclusion––that rank-related variation overwhelms any sex difference in cleaning behavior. We hope that these edits, together with the rest of our responses, provide a convincing demonstration of the tradeoffs of eliminating quartz from food surfaces.

    Reviewer #1 (Public Review):

    Summary

    We have no objections to Reviewer 1’s summary of our manuscript.

    Strengths

    Reviewer 1 is extremely gracious, and we are grateful for the kind words.

    Weaknesses

    Reviewer 1 identified several weaknesses, enumerating three types: (1) statistics, (2) insufficient links to foraging theory, and (3) interpretation and validity of the model. The present response is organized around these same categories.

    (1) Statistics

    We put all of our data and code into the Zenodo repository prior to submission. This content should have been accessible to Reviewer 1 from the outset. But in any event, we are very sorry for the mixup. To ensure access to our data and code during the present stage of review, we included the URL in the main mainscript and here: https://doi.org/10.5281/zenodo.14002737

    (a) AIC and outcome distributions

    Reviewer 1 criticized our use of AIC for determining model selection. We agree and this aspect of our manuscript is now removed. In lieu of AIC, we produced two data sets consisting of whole number counts (seconds) with means <5. The data were right-skewed due to high concentrations of biologically-meaningful zeros (i.e., bouts of food handling without any cleaning effort). Following the recommendations of Bolker et al. (2008) and others (Brooks et al. 2017, 2019), we chose an outcome distribution (zero-inflated Poisson, see response below) that best matched this data distribution. In addition, we evaluated the post-hoc performance of each of our models using the standardized residual diagnostic tools for hierarchical regression models available in the DHARMa package (Hartig, 2022). To further evaluate our choice of outcome distribution, we generated QQ-plots and residual vs. predicted plots for each model and included them in our revision as Figures S3-S5.

    (b) zeros

    Reviewer 1 expressed concern over our treatment of biologically-meaningful zeros, and recommended use of a zero-inflated GLMM with either a Poisson or negative binomial outcome distribution. We agree that such models are best for our two data sets. Accordingly, we fit a series of zero-inflated generalized linear mixed models (ZIGLMM) using the glmmTMB package in R, each with a logit-link function, a single zero-inflation parameter applying to all observations, and a Poisson error distribution. For the food-brushing model, we fit a zero-inflated Poisson (ZIP), which produced favorable standardized residual diagnostic plots with no major patterns of deviation (Figure S3) and minor, but non-significant underdispersion (DHARMa dispersion statistic = 0.99, p = 0.80). For our two food-washing models, we used zero-inflated models with Conway-Maxwell Poisson (ZICMP) distributions, an error distribution chosen for its ability to handle data that are more underdispersed (DHARMa dispersion statistic = 8.2E-09, p = 0.74) than the standard zero-inflated Poisson (Brooks et al. 2019). Using this error distribution improved residual diagnostic plots over a standard ZIP model and we view any deviations in the standardized residuals as minor and attributable to the smaller sample size of our food-washing data set (see Figures S4 and S5) (Hartig, 2022). We reported the summarized fixed effects tests for each GLMM in Tables S1-S3 as Analysis of Deviance Tables (Type II Wald chi square tests, one-sided) along with 𝜒2 values, degrees of freedom, and p-values (one-sided tests). Full model summaries with standard errors and confidence intervals are also included in Tables S4-S6. For all statistical analyses, we set 𝛼 = 0.05.

    (2) Absence of Links to Foraging Theory

    This critique has three components. The first revisits the absence of code for the optimal cleaning time model. This omission was an unfortunate error at the moment of submission, but our code is available now as a Mathematica notebook in Zenodo (https://doi.org/10.5281/zenodo.14002737). The second pivots around our scholarship, admonishing us for failing to acknowledge the marginal value theorem of Charnov (1976). It is a fair point and we have corrected the oversight with a citation to this classic paper. The third criticism is also rooted in scholarship, with Reviewer 1 asking for greater connection to the existing literature on optimal foraging theory, a point echoed in the summary assessment of the editors at eLife. This comment and the weight given to it by eLife’s editors put us in a difficult spot, as our paper is focused on the optimization of delayed gratification, not food acquisition per se. So, we are in the awkward position of gently resisting this recommendation while simultaneously agreeing with Reviewer 1 that we need to better situate our findings in the landscape of existing literature. To thread this needle, we produced Box 2 with a photograph and 410 words. This display box puts our findings into direct conversation with recent research focused on the sunk cost fallacy.

    (3) Interpretation and validity of model relative to data

    This critique is focused on the simulated brushing and washing results reported in Figure S1, along with its captioning, which was inadequate. We edited the caption to identify the author (JER) who simulated the brushing and washing behaviors of the monkeys. In addition, we clarified the number of brushing replicates (3) and washing replicates (3) for each of three treatments, for a total of 18 simulations.

    We followed Reviewer 1’s suggestion, incorporating the experimental uncertainty of grit removal into our optimal cleaning time model. We drew % grit removed values the % grit removed is used to estimate the cleaning inefficiency≥ 100%parameter 𝑐 for from a distribution, discounting the rare event when values were drawn. As brushing and washing, the included uncertainty now allows us to evaluate these parameters as distributions; and, in turn, obtain a distribution for our predicted brushing and washing optimal cleaning times. As we now describe in the main text, the optimal cleaning time for brushing and washing are 𝑡* = 0. 98 ± 0. 19 s and * = 2. 40 ± 0. 74 s, respectively. We are grateful for Reviewer 1’s suggestion, for it added𝑡 valuable context to our model predictions. Notably, the inclusion of experimental uncertainty did not change the qualitative nature of our results, or the interpretations of our model predictions compared to observed cleaning behaviors.

    We choose to exclude variability in handling time h to generate predicted cleaning time optima, at least in the main text. Our reasoning stems from the observation that handling time variability is long-tailed, with the longer handling times associated with behaviors that we do not account for in our analysis. For example, individuals carrying multiple cucumber slices to the ocean were apt to drop them, struggling at times to re-grasp so many at once. Such moments increased handling times substantially. Still, we acted on Reviewer 1’s suggestion, accounting for the tandem effects of handling time variability and uncertainty in % grit removed (see Figure S6). Drawing handling time estimates from a log-normal distribution fitted to the handling time data, we found that these dual sources of uncertainty did not qualitatively change our results. They added further uncertainty to the predicted washing time, but the mean remains roughly equivalent. (We note that brushing is assumed to have a constant handling time––composed of only assessment time and no travel––such that the results for brushing do not change.) Both analyses are included in the Mathematica notebook at (https://doi.org/10.5281/zenodo.14002737).

    Reviewer #2 (Public Review):

    Summary

    We have no objections to Reviewer 2’s summary of our manuscript.

    Strengths

    Reviewer 2 is extremely gracious, and we are grateful for the kind words.

    Weaknesses

    Reviewer 2 noted that our manuscript failed to provide “sufficient background on [our study] population of animals and their prior demonstrations of food-cleaning behavior or other object-handling behaviors (e.g., stone handling).” To address this comment, we edited the introduction (lines 56-58) to alert readers to the onset of regular food-cleaning behaviors sometime after December 26, 2004. In addition, we edited our methods text (lines 155-160) to highlight the onset and limited scope of prior research with this study population:

    “The animals are well habituated to human observers due to regular tourism and sustained study since 2013 (Tan et al., 2018). Most of this research has revolved around stone tool-mediated foraging on mollusks, the only activity known to elicit stone handling (Malaivijitnond et al., 2007; Gumert and Malaivijitnond, 2012, 2013; Tan et al., 2015), although infants and juveniles will sometimes use stones during object play (Tan, 2017). There has been no prior examination of food-cleaning behaviors.”

    Reviewer #3 (Public Review):

    Reviewer 3 identified three weaknesses, which we address in three paragraphs.

    Reviewer 3 questioned our methods for determining rank-dependent differences in cleaning behavior, arguing that our conclusions were unsupported. It is a fair point, and it compelled us to combine males and females into a single standardized ordinal rank of 24 individuals. This unified ranking is now reflected in the x-axes of Figure 2 and Figure S2. Plotting the data this way––see Figure S2––underscores Reviewer 3’s concern that sex and dominance rank are confounding variables. To address this problem, our GLMM included rank and sex as predictor variables, which controls for the effect of sex when assessing the relationship between rank and cleaning time across the three treatments. Reported in Tables S1-S3, these findings show that the effect of sex on either brushing or washing time was not significant. This result bolsters our original contention that rank-related variation in cleaning time overwhelms any sex differences.

    Relatedly, Reviewer 3 questioned our conclusions on the effects of rank because our study was focused on a single social group. In other words, it is plausible that our results were heavily influenced by the idiosyncrasies of select individuals, not dominance rank per se. It is a fair point, and it compelled us to include individual ID as a random effect in each of our GLMMs. Including individual ID as a random intercept allowed us to control for inter-individual variation in cleaning duration while assessing the effects of rank. An analysis based on additional social groups or longitudinal data are certainly desirable, but also well beyond the scope of a Short Report for eLife.

    Finally, Reviewer 3 objected to fragments of sentences in our abstract, introduction, and discussion, combining them into a criticism of claims that we did not and do not make. It probably wasn’t intentional, but it puts us in the awkward position of deconstructing a strawman:

    ● Review 3 begins, “there is no evidence presented on the actual fitness-related costs of tooth wear or the benefits of slightly faster food consumption”. This statement is true while insinuating that collecting such evidence was our intent. To be clear, our experiment was never designed to measure tooth wear or reproductive fitness, nor do we make any claims of having done so.

    ● Reviewer 3 adds, “Support for these arguments is provided based on other papers, some of which come from highly resource-limited populations (and different species). But this is a population that is supplemented by tourists with melons, cucumbers, and pineapples!” We were puzzled over these sentences. The first fails to mention that the citations exist in our discussion. Citing relevant work in a discussion is a basic convention of scientific writing. But it seems the underlying intent of these words is to denigrate the value of our study population because two dozen tourists visit Koram Island once a day. Exclamations to the contrary, the amount of tourist-provisioned food in the diet of any one monkey is negligible.

    ● Last, Reviewer 3 commented on matters of style, objecting to “overly strong claims.” We puzzled over this criticism because the claims in question are broader points of introduction or discussion, not results. The root problem appears to be the final sentence of our abstract:

    “Dominant monkeys abstained from washing, balancing the long-term benefits of mitigating tooth wear against immediate energetic requirements, an essential predictor of reproductive fitness.”

    This sentence has three clauses. The first is a statement of results, whereas the second and third are meant to mirror our discussion on the importance of our findings. We combined the concepts into a single concluding sentence for the sake of concision, but we can appreciate how a reader could feel deceived, expecting to see data on tooth wear and fitness. So, our impression is that we are dealing with a simple misunderstanding of our own making, and that this single sentence explains Reviewer 3’s criticism and tone––it cast a long shadow over the substance of our paper. To resolve this problem, we edited the sentence:

    “Dominant monkeys abstained from washing, a choice consistent with the impulses of dominant monkeys elsewhere: to prioritize rapid food intake and greater reproductive fitness over the long-term benefits of prolonging tooth function.”

  8. eLife assessment

    This valuable study tests the functional role of food-washing behavior in removing tooth-damaging sand and grit in long-tailed macaques and whether dominance rank predicts the level of investment in the behavior. The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank is incomplete given confounding between sex and rank and limited sample size. A more careful and perhaps restrained interpretation of the findings, as well as a connection to the existing literature on optimal foraging theory, would increase the value of the study to its intended audience, i.e. researchers interested in foraging behavior, cognition, and primate evolution.

  9. Reviewer #1 (Public Review):

    Summary:

    In this paper, the authors had 2 aims:

    (1) Measure macaques' aversion to sand and see if its' removal is intentional, as it is likely in an unpleasurable sensation that causes tooth damage.

    (2) Show that or see if monkeys engage in suboptimal behavior by cleaning foods beyond the point of diminishing returns, and see if this was related to individual traits such as sex and rank, and behavioral technique.

    They attempted to achieve these aims through a combination of geochemical analysis of sand, field experiments, and comparing predictions to an analytical model.

    The authors' conclusions were that they verified a long-standing assumption that monkeys have an aversion to sand as it contains many potentially damaging fine-grained silicates and that removing it via brushing or washing is intentional.

    They also concluded that monkeys will clean food for longer than is necessary, i.e. beyond the point of diminishing returns, and that this is rank-dependent.

    High and low-ranking monkeys tended not to wash their food, but instead over-brushed it, potentially to minimize handling time and maximize caloric intake, despite the long-term cumulative costs of sand.

    This was interpreted through the *disposable soma hypothesis*, where dominants maximize immediate needs to maintain rank and increase reproductive success at the potential expense of long-term health and survival.

    Strengths:

    The field experiment seemed well-designed, and their quantification of physical and mineral properties of quartz particles (relative to human detection thresholds) seemed good relative to their feret diameter and particle circularity (to a reviewer who is not an expert in sand). The *Rank Determination* and *Measuring Sand* sections were clear.

    In achieving Aim 1, the authors validated a commonly interpreted, but unmeasured function, of macaque and primate behavior-- a key study/finding in primate food processing and cultural transmission research.

    I commend their approach in developing a quantitative model to generate predictions to compare to empirical data for their second aim.

    This is something others should strive for.

    I really appreciated the historical context of this paper in the introduction, and found it very enjoyable and easy to read.

    I do think that interpreting these results in the context of the *disposable soma hypothesis* and the potential implications in the *paleolithic matters* section about interpreting dental wear in the fossil record are worthwhile.

    Weaknesses:

    Most of the weaknesses in this paper lie in statistical methods, visualization, and a missing connection to the marginal value theorem and optimal foraging theory.

    I think all of these weaknesses are solvable.

    The data and code were not submitted. Therefore I was unable to better understand the simulation or to provide useful feedback on the stats, the connection between the two, and its relevance to the broader community.

    (1) Statistics:

    (a) AIC and outcome distributions

    The use of AIC for hierarchical models, and models with different outcome distributions brought up several concerns.

    The authors appear to use AIC to help inform which model to use for their primary analyses in Tables S1 and S2. It is unclear which of these models are analyzed in Tables S3 and S4.

    AIC should not be used on hierarchical models, and something like WAIC (or DIC which has other caveats) would be more appropriate.

    Also, using information criteria on Mixture Models like Negative Binomials (aka Gamma-Poisson) should be done with extreme caution, or not at all, as the values are highly sensitive to the data structure.

    Some researchers also say that information criteria should not be used to compare models with different outcome distributions - although this might be slightly less of a concern as all of your models are essentially variations on a Poisson GLM.

    Discussion on this can be found in McElreath Statistical Rethinking (Section 12.1.3) and Gelman et al. BDA3 (Chapter 7).

    Choosing an outcome distribution, based on your understanding of the data generating process is a better approach than relying on AIC, especially in this context where it can be misleading.

    (b) Zeros

    I also had some concerns about how zeros were treated in the models.

    In lines 217-218, they mentioned that "if a monkey consumed a cucumber slice without brushing or washing it, the zero-second duration was included in both GLMMs."

    This zero implies no processing and should not be treated as a length 0 duration of processing.

    This suggests to me that a zero-inflated poisson or zero-inflated negative binomial, would be the best choice for modelling the data as it is essentially a 2-step process:
    (i) Do they process the cucumber at all?
    (ii) If so do they wash or brush, and how is this predicted by rank and treatment?

    (2) Absence of Links to Foraging Theory

    Optimal cleaning time model: the optimality model was not well described including how it was programmed. Better description and documentation of this model, along with code (Mathematica judging from the plot?) is needed.

    There seems to be much conceptual and theoretical overlap with foraging theory models that were not well described - namely the *marginal value theorem (Charnov (1976), Krebs et al. (1974),) and its subsequent advances* (see https://doi.org/10.1016/j.jaa.2016.03.002 and https://doi.org/10.1086/283929 for examples).

    In the suggestions, I attached the R code where I replicated their model to show that it is *mathematically identical to the marginal value theorem*. This was not mentioned at all in the text or citations.

    This is a well-studied literature since the 1970's and there is a history of studies that compare behavior to an optimality model and fail (or do find) instances where animals conform or diverge with its predictions (https://doi.org/10.1146/annurev.es.15.110184.002515). This link should be highlighted, and interpreting it in that theoretical context will make it more broadly applicable to behavioral ecologists.

    The data was subsetted to include instances where there were < 3 monkeys present to avoid confounds of rank, but it is important to know that optimal behavior might vary by individual, and can change in a social context depending on rank (see https://doi.org/10.1016/j.tree.2022.06.010). Discussion of this, and further exploration of it in the data would strengthen the overall contribution of this manuscript to the field, but I understand that the researchers wish to avoid that in this paper for it is a complex topic, which this dataset is uniquely suited to address.

    (3) Interpretation and validity of model relative to data

    In lines 92-102, they present summary statistics (I think) showing that time spent brushing and washing is consistent with washing or brushing to remove sand.

    In the **mitigating tooth wear** section (line 73) and corresponding Figure S1 showing surface sand removed, more detail about how these numbers were acquired, and statistical modelling, is needed.

    This is important as uncertainty and measurement error around these metrics are key to the central finding and interpretation of Aim 2 in this paper.

    It appears that the researchers simulated the monkey's brushing and washing behaviors (similar to https://doi.org/10.1007/s10071-009-0230-3).

    How many researchers simulated monkey behavior and how many times?

    What are the repeat points in Figure S1?

    What is the number of trials or number of people?

    This effect appears stronger for washing than brushing as well - if so, why?

    More info about this data, and the uncertainty in this is important, as it is key to the second central claim of this paper.

    The estimates of removing between 76% +/- 7 and 93% +/- 4 of sand (visualized in Figure S1), are statistical estimates.

    I would find the argument more convincing if after propagating for the uncertainty in handling in sand removal rates, and the corresponding half-saturation constants, if this processing for food is too long, after accounting for diminishing returns held true.
    It is very possible that after accounting for uncertainty and variation in handling time and removal rates, the second result may not hold true.

    I was not able to convince myself of this via reanalysis as the description of the data in the text was not enough to simulate it myself.

    Essentially, this would imply that in Figure 3 the predicted value would have some variation around it (informed by boundary conditions of time being positive, and percents having floors and ceilings) and that a range of predicting cleaning times (optimal give-up times) would be plotted in Figure 3.

    This could be accomplished in a Bayesian approach, Or by simply plotting multiple predictions given some confidence interval around, c and h.

  10. Reviewer #2 (Public Review):

    Summary:

    This field experiment aimed to assess what motivates macaque monkeys to clean food items prior to consumption and the relative costs and benefits of different cleaning approaches (manually brushing sand from food versus dousing food items in water). The experiment teases apart if/how the benefits of these approaches are mediated by the amount of debris on food and the monkeys' rank in terms of the costs of consuming sand versus the time and energy required to remove it. The authors not only examined the behavioral responses of wild macaques to three conditions of food sand contamination but also tested the relative costs of consuming different levels and sizes of sand particulates. Through this, the authors propose considerations of the macaques' motivations to clean food and the balance they take in energetic gains from consuming food versus the costs of cleaning food and consuming sand. Their data reveal that food washing is more effective in removing sand, but more costly than manually brushing off sand. This study also revealed that only mid-ranked monkeys washed their food, while high and low-ranked monkeys were more likely to remove sand via brushing it off food with their hands.

    Strengths:

    This study provides a very in-depth consideration of the motivations of macaques to clean their food, and the relative costs and benefits of different food cleaning techniques. Not only did the study test the behavior of wild macaques via a simple yet elegant field study, but they also performed a detailed analysis of the sand particulates to understand the level of potential tooth wear that consuming it could result in. By relying on a wild group of macaques that have been part of a long-term study site, the team also had detailed behavioral data on the population to allow for rank assessments of the animals. This comprehensive study provides important foundational information for a better understanding of how and why macaques clean food, that inform existing and future considerations of this as a potential cultural behavior.

    Weaknesses:

    As currently written, the paper does not provide sufficient background on this population of animals and their prior demonstrations of food-cleaning behavior or other object-handling behaviors (e.g., stone handling). Moreover, the authors' conclusions focus on the behavior of high-ranked animals, but subordinate animals also showed similar behavioral patterns and they should be considered in more detail too.

  11. Reviewer #3 (Public Review):

    This paper provides evidence that food washing and brushing in wild long-tailed macaques are deliberate behaviors to remove sand that can damage tooth enamel. The demonstration of the immediate functional importance of these behaviors is nicely done. However, the paper also makes the claim that macaques systematically differ in their investment in food cleaning because of rank-dependent differences in their costs and benefits. This latter conclusion is not, in my view, well-supported, for several reasons.

    First, as is typical in many primate studies, the authors construct sex-specific ordinal rank hierarchies. This makes sense since hierarchies for males and hierarchies for females are determined by different processes and have different consequences. However, if I understand it correctly, they are then lumped together in all statistical analyses of rank, which makes the apparent rank effect very difficult to understand. The challenge of interpretation is increased because there are twice as many adult females in the group as adult males, so the rank is confounded by sex (because all low-rank values are adult females).

    Second, because only one social group is being studied, the conclusions about rank may be heavily driven by individual identity, not rank per se. An analysis involving replicate social groups (which granted, may be impossible here) or longitudinal data showing a change in behavior following a change in rank would be much more compelling.

    Third, there is no evidence presented on the actual fitness-related costs of tooth wear or the benefits of slightly faster food consumption. Support for these arguments is provided based on other papers, some of which come from highly resource-limited populations (and different species). But this is a population that is supplemented by tourists with melons, cucumbers, and pineapples! In the absence of more direct data on fitness costs and benefits, the paper makes overly strong claims about the ability to explain its observations based on "immediate energetic requirements" (abstract), "difference...freighted with fitness consequences" (line 80), and "pressing energetic needs"/"live fast, die young" (lines 121-122--there is no evidence that tooth wear is associated with morbidity or mortality here). The idea that high-ranking animals are "sacrificing their teeth at the altar of high rank" seems extreme.