Object representations reflect hierarchical scene structure and depend on high-level visual, semantic, and action information.
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objects in real-world scenes typically follow regular, hierarchically structured arrangements, where anchor objects (e.g., a sink) guide the placement and identification of associated local objects (e.g., a toothbrush), forming clusters referred to as "phrases." Depending on the actions performed within a scene, it can consist of multiple such phrases. According to the scene grammar framework, these hierarchical relationships enable the brain to efficiently process complex scenes and guide behavior by leveraging the statistical regularities of object arrangements. This study investigates whether shared neural representations of objects reflect this hierarchical organization. Using EEG, we explored the temporal dynamics of phrase-specific shared object representations through cross-classification analysis. Specifically, classifiers were trained on neural data from one object type (either the anchor or its associated local object) and tested for their ability to generalize to other objects within and between phrases, as well as across different scenes. Our findings reveal an early time window (128–164 ms) that supports the existence of phrase-specific shared object representations. Additionally, representational similarity analysis (RSA) revealed that these shared representations predominantly rely on high-level visual and semantic features, as well as implied actions, rather than low-level visual similarities. Notably, "upward" generalization (local to anchor) is driven primarily by high-level visual and semantic features, while "downward" generalization (anchor to local) is influenced by high-level semantic features and implied actions. These findings provide evidence that early object processing in the brain mirrors the behaviorally relevant hierarchical structure of real-world scenes.