Shape-invariant perceptual encoding of dynamic facial expressions across species

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Dynamic facial expressions are crucial for communication in primates. Due to the difficulty to control shape and dynamics of facial expressions across species, it is unknown how species-specific facial expressions are perceptually encoded and interact with the representation of facial shape. While popular neural-network theories predict a joint encoding of facial shape and dynamics, the neuromuscular control of faces evolved more slowly than facial shape, suggesting a separate encoding. To investigate this hypothesis, we developed photo-realistic human and monkey heads that were animated with motion-capture data from monkeys and human. Exact control of expression dynamics was accomplished by a Bayesian machine-learning technique. Consistent with our hypothesis, we found that human observers learned cross-species expressions very quickly, where face dynamics was represented independently of facial shape. This result supports the co-evolution of the visual processing and motor-control of facial expressions, while it challenges popular neural-network theories of dynamic expression-recognition.

Article activity feed

  1. ###Reviewer #2:

    This human psychophysics study claims to provide more evidence in support of the popular notion that visual processing of faces may involve partially independent processes for the analysis of static information such as facial shape versus dynamic information such as facial expression. In this respect the scientific hypotheses and conclusions are not novel, although some of the methods (parametric variation of facial expression dynamics using computer-generated animations) and analyses (Bayesian generative modeling of expression dynamics) are relatively new. Although the science is rigorously conducted, the paper currently feels heavy on statistics and technical details but light on data, compelling results and clear interpretation. However, the main problem is that the study fails to provide sufficient controls to support its central claims as currently formulated.

    Concerns:

    1. A central claim of the paper and the first words in the title are that the behavior studied (categorization of facial expression dynamics) is "shape-invariant". However, the lack of variation in facial shapes (n = 2) used here limits the strength of the conclusions that can be drawn, and it certainly remains an open question whether representations of facial expression dynamics are truly "shape-invariant". A simple control would have been to vary the viewing angle of the avatars, in order to dissociate 3D object shapes from their 2D projections (images). The authors also claim that "face shapes differ considerably" (line 49) amongst primate species, which is clearly true in absolute terms. However, the structural similarity of simian primate facial morphology (i.e. humans and macaques used here) is striking when compared to various non-primate species, which naturally raises questions about just how shape-invariant facial expression recognition is. The lack of data to more thoroughly support the central claim is problematic.

    2. As the authors note, macaque and human facial expressions of 'fear' and 'threat' differ considerably in visual salience and motion content - both in 3D and their 2D projections (i.e. optic flow). Indeed, the decision to 'match' expressions across species based on semantic meaning rather than physical muscle activations is a central problem here. Figure 1A illustrates clearly the relative subtlety of the human expression compared to the macaque avatar's extreme open-mouthed pose, while Fig 1D (right panels) shows that this is also true of macaque expressions mapped onto the human avatar. The authors purportedly controlled for this in an 'optic-flow equilibrated' experiment that produced similar results. However, this crucial control is currently difficult to assess since the control stimuli are not illustrated and the description of their creation (in the supplementary materials) is rather convoluted and obfuscates what the actual control stimuli were.

    The results of this control experiment that are presented (hidden away in supplementary Fig S3C) show that subjects rated the equilibrated stimuli at similar levels of expressiveness for the human vs macaque avatars. However, what the reader really needs to know is whether subjects rated the human vs macaque expression dynamics to be similarly expressive (irrespective of avatar)? My understanding is that species expression (and not species face shape) is the variable that the authors were attempting to equilibrate for.

    In short, the authors have not presented data to convince a reader that their equilibrated stimuli resolve the obvious confound in their original stimuli (namely the correlation between low level visual salience - especially around the mouth region- and the species of the expression dynamics).

    1. This paper appears to be the human psychophysics component of work that the authors have recently published using the macaque avatar. The separate paper (Siebert et al., 2020 - eNeuro) reported basic macaque behavioral responses to similar animations, while the task here takes advantage of the more advanced behavioral methods that are possible in human subjects. Nevertheless, the emphasis of the current paper on cross-species perception begs the question - how do macaques perceive these stimuli. Do the authors have any macaque behavioral data for these stimuli (even if not for the 4AFC task) that could be included to round this out? If not, I recommend rewording the title since it's current grammatical structure implies that the encoding is "across species", whereas encoding of species (shape and expression) was only tested in one species (humans).
  2. ###Reviewer #1:

    Overall assessment:

    The strengths of this paper are the novel cross species stimuli and very interesting behavioural findings, showing sharper tuning for recognising human expression sequences compared to monkey expressions. Technically, the paper is of a very high quality, both in terms of stimulus creation, but also in terms of analysis. Appropriate control experiments have been run, and in my view, the only concern is the way the results are presented, which I believe can be dealt with by restructuring the text. Other than that, I feel this would make a very nice contribution to the field.

    Concerns:

    The only major concern that I have is that the main take-home messages do not come through clearly in the way the Results section is currently structured. I found there was still too much technical detail - despite considerable use of Supplementary Information (SI) - which made extracting the empirical findings quite hard work. The details of the multinominal regression, the model comparisons (Table 1) and even the Discriminant Functions (Fig 2), for example, could all be briefly mentioned in the main text, with details provided in Methods or SI. These are all interesting, but I feel the focus should be on the behavioural findings, not the methods.

    I would suggest using the Discussion as a guide (this clearly states the key points) making sure the focus is more on Figure 3 and then working through the points more concisely.

    Obviously, this can be achieved simply by re-writing and does not take away from the significance of the work in any way. While the quality of the English is generally very high, some very minor wording issues could also be dealt with at this stage.

  3. ##Preprint Review

    This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    ###Summary:

    The paper employs novel cross species stimuli and a well-designed psychophysical paradigm to study the visual processing of facial expression. The authors show that facial expression discrimination is largely invariant to face shape (human vs. monkey). Furthermore, they reveal sharper tuning for recognising human expressions compared to monkey expressions, independent of whether these expressions were conveyed by a human or a monkey face. Technically, the paper is of a very high quality, both in terms of stimulus creation, but also in terms of analysis.