The psychophysics of compositionality: Relational scene perception occurs in a canonical order
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We see not only objects and their features (e.g., glass vases or wooden tables) but also relations between them (e.g., a vase on a table). An emerging view accounts for such relational representations by positing that visual perception is compositional: Much like language, where words combine to form phrases and sentences, many visual representations contain discrete constituents that combine systematically. This perspective raises a fundamental question: What principles guide the composition of relational representations, and how are they built over time? Here, we tested the hypothesis that the mind constructs relational representations in a canonical order. Inspired by a distinction from cognitive linguistics, we predicted that 'reference' objects (typically large, stable, and able to physically control other objects; e.g., tables) take precedence over 'figure' objects (e.g., vases) during scene composition. In Experiment 1, participants who arranged items to match linguistic descriptions (e.g., "The vase is on the table", "The table is supporting the vase") consistently placed reference objects first (e.g., table, then vase). Experiments 2–5 extended these findings to visual recognition itself: participants were faster to verify scene descriptions when reference objects appeared before figure objects in a scene, rather than vice versa. This Reference-first advantage emerged rapidly (within 100 ms), persisted in a purely visual task, and reflected abstract principles (e.g., physical forces) beyond simple differences in size or shape. Our findings reveal psychophysical principles underlying compositionality in visual processing: the mind builds representations of object relations sequentially, guided by the objects' roles in those relations.