Dynamics of vision: Grouping takes longer than crowding

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Vision is often understood as a hierarchical, feedforward process, where visual processing proceeds from low-level features to high-level representations, ultimately resulting in a coherent percept. Within just a few tens of milliseconds, the fundamental features of the percept are established. These traditional models are the primary framework for understanding visual crowding, a perceptual phenomenon where the perception of a target is impaired by the presence of flanking elements. In this framework, the early stages of feedforward visual processing are essential for perceiving stimuli under crowding conditions and temporal aspects are not important. In fact, experiments show that stimulus duration affects perception very little with classic crowding displays. Here, we show that for more complex displays, crowding is the outcome of highly dynamic processes. For example, a 20 ms preview of just the flankers can reduce crowding for displays with the target and the same flankers presented up to 1 second later. This effect occurs only if the flankers have the potential to group during longer stimulus durations. Our findings align with predictions made by the LAMINART model, which employs recurrent segmentation processes unfolding over time to separate objects into distinct representation layers. Taken together, our results highlight the importance of time-consuming grouping processes in spatial and temporal interactions in vision. In contrast to the classic feedforward models of vision, we propose that crowding, and vision in general, is a dynamic process in which multiple potential interpretations of a stimulus are modulated and gated by grouping mechanisms that evolve over time.

Article activity feed