Untangling the animacy organization of occipitotemporal cortex

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Some of the most impressive functional specialization in the human brain is found in occipitotemporal cortex (OTC), where several areas exhibit selectivity for a small number of visual categories, such as faces and bodies, and spatially cluster based on stimulus animacy. Previous studies suggest this animacy organization reflects the representation of an intuitive taxonomic hierarchy, distinct from the presence of face- and body-selective areas in OTC. Using human fMRI, we investigated the independent contribution of these two factors – the face-body division and taxonomic hierarchy – in accounting for the animacy organization of OTC, and whether they might also be reflected in the architecture of several deep neural networks. We found that graded selectivity based on animal resemblance to human faces and bodies masquerades as an apparent animacy continuum, which suggests that taxonomy is not a separate factor underlying the organization of the ventral visual pathway.

Article activity feed

  1. ###Reviewer #3

    Overview and general assessment:

    In further untangling the organisation of occipitotemporal cortex (OTC), this paper attempts to explain, using behavioural and categorical models, the graded representations of images of animal faces and bodies, and objects (plants), in OTC and the face, body, and object-selective regions within OTC. The data suggest two main results. One, the representations in OTC seem to be (independently) related to an animate-inanimate distinction, a face-body distinction, and a taxonomic distinction between the images. Two, the representations in the face and body selective regions in OTC are related to the face/body images' similarity to human face/body respectively as gauged with a behavioural experiment. This similarity to human face/body subsumes the variance in face/body-selective OTC related to the authors' model of taxonomic distinction. These observations are used to suggest that the graded responses to animal images in OTC reported by previous studies (termed the animacy continuum in some cases) might just be based on animal resemblance to human faces and bodies than on a taxonomy. The claims, if valid, are a major addition to the ongoing discussion about the nature and underlying principles of the organisation of object representations in high-level visual cortex.

    There might be a multitude of issues, outlined below, with the way the observations are used to support the authors' claims. Addressing those issues might help reveal if the claims are indeed supported by the data which would be crucial in deciding whether to publish the current version of this paper.

    Main concerns:

    On "OTC does not reflect taxonomy" (line 390): Observations in Figure 4 suggest that the variance in face/body-selective OTC explained by the taxonomy RDMs is for most part a subset of the variance explained by the human face/body similarity RDMs. This observation is used to suggest that "there is no taxonomic organisation in OTC" (line 423). Wouldn't such a statement be valid only if the taxonomy RDM did not explain any variance in OTC? Couldn't the observation that the variance it explains is also explained by human-similarity imply that the human-similarity is partly based on taxonomy? Also, the positive and strong correlation between the human-similarity RDMs and CNN RDMs in Figure 6 suggest that the human-similarity judgements reflect visual feature differences. However, how would you distinguish between the variance in the human-similarity RDM described by visual feature differences and by a more semantic concept such as taxonomy? Without disentangling these visuo-semantic factors (as done in Proklova et al. 2016 and Thorat et al. 2019) how could we be sure that OTC does not reflect taxonomy?

    On "OTC does not represent object animacy" (line 434): Figure 2 suggests that the animacy RDM is related to the OTC RDMs, even after factoring out the face/body and taxonomy RDM contributions. The point raised in the above section also makes it harder to suggest that animacy (the semantic part) is not represented in OTC. While the studies mentioned in the discussion are part of the ongoing debate on whether animacy is indeed represented in OTC, such a definitive statement seems out of place in the discussion in this paper where the data do not seem to suggest the absence of animacy in OTC.

    On "Deep neural networks do not represent object animacy" (line 468): "trained DNNs plausibly do not represent either a taxonomic continuum or a categorical division between animate and inanimate objects" (lines 487-488). In Figure 5 there is a clear negative correlation with the animacy RDM for most of the CNNs i.e. a "categorical distinction". Other models are not factored out in Figure 5 to suggest that the animacy RDM contribution is not unique as the statement suggests. Also, the way the CNNs are trained, they are not fed explicit animacy information so whatever variance is related to animacy as quantified by the categorical/behavioural models suggests that those models might be capitalising on visual feature differences. As such, indeed, CNNs do not represent animacy – but then that is a trivial statement – it seems they do represent visual feature differences which can be associated with animacy.

    Minor comments:

    (lines 53-54) "These studies equate the idea of a continuous, graded organisation in OTC with the representation of a taxonomic hierarchy" This is false. For example, in Thorat et al. 2019 this equality was questioned by dissociating between an agency-based (which would be similar to taxonomy) hierarchy and a visual similarity hierarchy. The point about differential focus on faces or bodies for different animals is a valid point and requires further research to be elucidated.

    For the taxonomy model, is it appropriate that the assumed distance between the Mammal 1 class and the Mammal 2 class is the same as the one between Mammal 2 class and the Birds class? Is this what we expect in OTC? In terms of spearman correlations this assumption might be fine, but when the model contributions are partitioned using regression (e.g. in Figure 2) the emphasis does shift to the magnitude of the distances than the ranks of the distances. This assumption might be running into a bigger problem when comparisons between the taxonomy model and human-similarity models are made. The human-similarity model seems to capture the differences with the Mammal 1 class which are collapsed into one measure in the taxonomy model. Might this difference underlie the observed results where the variance captured by the taxonomy model is subsumed by the variance explained by the human-similarity model?

    Would it be possible to acquire confidence intervals for the independent and shared variance explained by the 3 models in Figure 2 (and elsewhere where there is a similar analysis)? That might help us understand if the individual contribution of, say the animacy model, to OTC is robust. In the same vein, it might be a good to indicate the robustness of the differences between the correlations of the different models with L/V-OTC in the figures.

    (lines 181-182) "the taxonomic hierarchy is more apparent in VOTC-all, while the face-body division is also still clearly present" What is the significance of this distinction (also echoed in lines 222-223 after the face/body ROI analysis)?

    Across the animals how correlated are the human-body similarity and human-face similarity RDMs? It seems that different set of participants provided these two models. Is that the case? Are the correlations between the two models at the noise ceilings of each other? Is there any specificity of model type with ROI type i.e. does the human-face similarity model correlate more with L/V-OTC face than with L/V-OTC body and vice versa for the human-body similarity model? Basically, how different are the two models?

    In Figure 4, how do the correlations of the mentioned models look like with L/V-OTC-object? While it is interesting to understand the graded responses in the face and body areas, it might be good to see if the human-face/body similarity models also explain the graded responses in the, arguably more general, object-selective ROIs. Of course, here the object-selective ROI would share a lot of voxels with the body and face selective ROIs and the results might be similar, but might still make sense to add the object-selective ROI results as a supplemental figure to Figure 4. Also in Figure 1, it is clear that the 3 ROIs do not cover all of L/V-OTC. In making claims about the representations in OTC at large, would it be useful to also analyse L/V-OTC-all (or go further and get an anatomically-defined region) with the human face/body-similarity models?

    What is the value of the noise ceiling for VOTC-body in Figure 4B?

    Why might the animacy model be negatively correlated with the CNN layer RDMs?

  2. ###Reviewer #2

    The authors sought to reconcile three observations about the organisation of human high-level visual cortex: 1) the reliable presence of focal selective regions for particular categories (especially faces and bodies) 2) broader patterns of brain responses that distinguish animate and inanimate objects and 3) more recent findings pointing to organisation reflecting a taxonomic hierarchy describing the semantic relationships amongst different species. To this end, they conducted a well-designed and technically sophisticated fMRI study following a representational similarity approach, seeking to pull apart these factors via careful selection of stimuli and comparison of evoked BOLD activity with predicted patterns of (dis)similarity. This was complemented by an analysis comparing similarities of these models with the properties of the deeper layers of several deep neural networks trained to categorise images. The authors draw "deflationary" conclusions, to argue that models of OTC emphasising semantic taxonomy or animacy are unnecessarily complex, and that instead the most powerful organisational principle to account for extant findings is by reference to representations that are anchored specifically on the face and the body.

    1. In many ways, this study is designed as a response to a few specific previous papers on related topics, notably two by Connolly et al., and others by Sha et al and Thorat et al. One limitation of the paper is that it perhaps relies too much on knowledge of that previous work - for example, points about the "intuitive taxonomic hierarchy" that build on that work were not fully explicated in the Introduction and only became gradually clear through the ms. More seriously, I am concerned that the authors' conclusions depend on methodological differences with the other work. The authors focused their analyses on focal regions identified as face-, body-, or object-selective in localiser runs. Judging from Figure 1B, this generates a rather restricted set of regions that are then examined in detail with various RDM analyses. In comparison, some of the previous studies worked with much broader occipito-temporal regions of interest, and/or used searchlight methods to find regions with specific tuning properties without defining regions in advance. To put it more bluntly, the authors may have put their thumb on the scale: by focusing closely on regions that by selection are highly face or body selective, they have found that faces and bodies are key drivers of response patterns. So in this light I was confused by the section beginning at line 442 ("Based on this...") in which the authors seem to dismiss the possibility that animacy dimensions are captured over a broader spatial scale, but they have not measured responses at that scale in the present study. In sum: applied to wider regions of occipitotemporal cortex, the same approach might plausibly generate very different findings, complicating the authors' ultimate conclusions.

    2. I was not fully convinced by the inclusion of the DNN analyses. In contrast with the brain/behaviour work, this did not seem strongly hypothesis driven, but rather exploratory, and more revealing of DNN properties than answering the questions about human neuroanatomy that the authors set out in the introduction. Would this part of the study be better reported in more detail, in a different paper?

    3. Looking at Figure 1C - is it the case that each of these data-to-model comparisons is equally well-powered? The three models are not equally complex: the animacy and face-body models are binary, while the taxonomy model makes a more continuous prediction. Potentially, then, this sets a higher statistical bar for the taxonomy model than the others. That is, it is consistent with a narrower and more specific set of the space of possible results: the binary models essentially say "A should be larger than B" but the taxonomy model says "A should be larger than B, should be larger than C, etc.". If not taken into account, this difference might put the taxonomy model at an unfair disadvantage when compared directly against the other two.

    Minor Comments:

    The authors report a series of VOTC/LOTC "all" analyses, and also a series of analyses of the specific ROIs that compose these unified ROIs (e.g. face or body specific regions only). In that sense, these analyses are partly redundant to each other, rather than being independent tests. If I read this correctly, then this suggests that statistical corrections may be in order to account for this non-independence, and/or some tempering of conclusions that rely on these as being two distinct indexes of brain activity.

  3. ###Reviewer #1

    In this fMRI study, Ritchie et al. investigated the representation of animal faces and bodies in (human) face- and body-selective regions of OTC, testing whether animal representations reflect similarity to human faces and bodies (as rated by human observers) or a taxonomic hierarchy. Results show that similarity to humans best captures the representational similarity of animal faces and bodies in face- and body-selective regions.

    This is a well-conducted study that convincingly shows that animals' similarity to humans is important for understanding responses to animals in face- and body-selective regions. More generally, it suggests that previously observed selectivity to animals is (at least partly) driven by responses in known (human) face- and body-selective regions. These findings make a lot of sense in the context of earlier work. I was, however, a bit puzzled by the framing of the study and the interpretation of the results. I hope my comments are useful for revising the paper.

    Major comments:

    1. The study is framed around a couple of recent fMRI studies (most notably Sha et al., 2015 and Thorat et al., 2019) claiming that the animacy organization in visual cortex reflects a continuum rather than a dichotomy. The submitted study contrasts this claim with the alternative of a face-body division. The authors conclude that taking into account the face-body division explains away the proposed animacy continuum (here taken as taxonomic hierarchy) account. I had difficulty following this logic. There seem to be at least three separate questions here: 1) does the animacy organization reflect activity in face/body-selective regions, or are there animate-selective clusters that are different from known face- and body-selective regions? 2) assuming that animals activate known face- and body-selective regions, are responses in these regions organized along a human-similarity continuum? 3) what is the nature of this continuum - conceptual and/or visual? Could you clarify which questions your study address? See below for more explanation.

    2. One of the conclusions relates to the first question ("Our results provide support for the idea that OTC is not representing animacy per se, but simply faces and bodies as separate from other ecologically important categories of objects."). I am missing a review of previous work here: there is already strong evidence showing that the animacy organization is closely related to the face/body organization. For example, Kriegeskorte et al. (2008) showed that the animate-inanimate distinction is the top-level distinction in OTC, with the animate category consisting of face and body clusters (rather than human vs animal); see also Grill-Spector & Weiner (2014) for perhaps the leading account of how animacy and face/body selectivity may be hierarchically related. Furthermore, earlier work reported responses to animal faces and bodies in human face- and body-selective regions. For example, Kanwisher et al. (1999) found responses to animal faces "as might be expected given that animal faces share many features with human faces" and concluded: "Thus the response of the FFA is primarily driven by the presence of a face (whether human or animal), not by the presence of an animal or human per se.". Tong et al. (2000) reached similar conclusions. Similar findings were also reported for animal bodies in body-selective regions, with stronger responses to animal bodies (e.g. mammals) that are more similar to humans (Downing et al., 2001; Downing et al., 2006). Considering this literature (none of which is cited in the Introduction), it seems rather well established that the animacy organization is directly related to face/body selectivity, that animal faces/bodies activate human face-/body-selective regions, and that this activation depends on an animal's similarity to human faces/bodies. (More generally, visual similarity is well-known to be reflected in visual cortex activity, including in category-selective regions (e.g. work by Tim Andrews)). It would be helpful if the current study is introduced in the context of this previous work so that it is clear what new insights the current study brings.

    3. Related to the second question, the current results provide convincing evidence for a human-similarity dimension. However, contrary to the claims of the paper, the continua proposed in Sha et al. and Thorat et al. would seem to predict a similar result, considering that these studies defined the animacy continuum in terms of an animal's similarity to humans: Sha et al.: "the degree to which animals share characteristics with the animate prototype-humans."; Thorat et al.: "the animacy organization reflects the degree to which animals share psychological characteristics with humans". To model this dimension, rather than assuming a 1-6 taxonomic hierarchy, participants could rate the animals' similarity to humans, as for example done in Thorat et al. You will likely find that these ratings correlate highly with the visual similarity ratings in the current study. The obvious problem is that animals that are similar to humans tend to share both conceptual and visual properties with humans. By the way: it would be relevant to discuss Contini et al. (2020) in the Introduction, as this paper similarly proposed a human-centric account.

    4. This brings us to the third question, whether "similarity to humans" is purely visual (i.e., image based) or whether conceptual similarity also contributes to explaining responses. Sha et al. could not address this question because their stimuli confounded the two dimensions. However, it was not clear to me that the submitted study can address this question any better, considering that the stimuli were not designed for distinguishing the two dimensions either: bodies/faces that are visually more similar to humans will belong to animals that are conceptually more similar to humans as well.

    5. The study is quite narrowly focused on debunking the taxonomy hierarchy supposedly proposed by previous studies. If this is the goal, you would need to stay close to these previous studies in terms of analyses and regions of interest. If not, it is hard to compare results across studies. For example, the abstract states that: "previous studies suggest this animacy organization reflects the representation of an intuitive taxonomic hierarchy, distinct from the presence of face- and body-selective areas in OTC." I'm not sure who made this claim, but if this was the claim that you want to test, wouldn't you need to look outside of face- and body-selective regions for this taxonomic hierarchy? Or if the study is a follow-up to Sha et al., then it would be useful to see their analyses repeated here, or at least present results in comparable ROIs. Alternatively, you could detach the research question from these studies and focus more on animal representations in face- and body-selective regions (after introducing what we know about these regions).

    Minor comments:

    1. The third paragraph of the Introduction mentions "these studies", but it is not clear which specific studies you refer to (the preceding paragraph cites many studies).

    2. Did you correct for multiple comparisons when comparing the models (e.g. p.10)?

    3. Could the human-similarity ratings partly reflect conceptual similarity? Might it not be hard for participants to distinguish purely visual properties from more conceptual properties? Perhaps the DNNs can be used to create an image-based human-similarity score?

    4. It was not entirely clear to me what the DNNs added to the study (which asks a question about human visual cortex). These are also not really introduced in the Introduction, and are only briefly mentioned in the Abstract. Was the idea to directly compare representations in DNNs to those in OTC?

    5. p.15: refers to Figures 6A and 6B instead of 4A and 4B

  4. ##Preprint Review

    This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    ###Summary:

    The reviewers agreed that your paper reports a well-conducted study revealing several interesting results. However, they were ultimately not convinced that one of the main conclusions of the paper – the absence of an animal taxonomy – was sufficiently supported by the presented data, also considering the difference in analysis methods compared to previous studies. Furthermore, they noted that the reported results are somewhat incremental relative to earlier work reporting responses to animal faces/bodies in face-/body- selective regions.