How the visual brain can learn to parse images using a multiscale, incremental grouping process

Sami Mollard
Sander M. Bohte
Pieter R. Roelfsema

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Natural scenes usually contain many objects that need to be segregated from each other and the background. Object-based attention is the process that groups image fragments belonging to the same objects. Curve-tracing tasks provide a special case, testing our ability to group image elements of an elongated curve. In the brain, curve-tracing is associated with the gradual spread of enhanced neuronal activity over the representation of the traced curve. Previous studies demonstrated that the tracing speed is higher if curves are far apart than if they are nearby. One hypothesis is that a larger distance between curves permits activity propagation in higher visual cortex areas. In these higher areas receptive fields are larger and connections exist between neurons representing image regions that are farther apart (Pooresmaeili et al., 2014). We propose a recurrent architecture for the scale-invariant tracing of curves and objects. The architecture is composed of a feedforward pathway that dynamically selects the appropriate scale for tracing, and a recurrent pathway for propagating enhanced neuronal activity through horizontal and feedback connections, enabled by a disinhibitory loop involving VIP and SOM interneurons. We trained the network using a biologically plausible reinforcement learning scheme and observed that training on short curves allowed the networks to generalize to longer curves and 2D-objects. The network chose the scale based on the distance between curves and the width of objects, just as in human psychophysics and the visual cortex of monkeys. The results provide a mechanistic account of the learning and execution of multiscale perceptual grouping in the brain.

Significance Statement

In our perception, image elements that belong to the same object are grouped by object-based attention. Object-based attention corresponds to an enhanced neuronal representation of the image elements that are grouped in perception, in multiple areas of the visual cortex. During perceptual grouping tasks, this enhanced neuronal activity spreads gradually over an object representation, with a speed that depends on the distance between the relevant object and other objects. Here we propose a neuronal mechanism that learns scale-invariant spread of object-based attention and accounts for psychophysical observations in human observers and the pattern of neuronal activity in the visual cortex of monkeys. This work sheds light on the mechanisms for multiscale object-based attention in the visual cortex.

Version published to 10.1101/2024.06.17.599272v3 on bioRxiv
Apr 23, 2025
Version published to 10.1101/2024.06.17.599272v2 on bioRxiv
Mar 24, 2025
Version published to 10.1101/2024.06.17.599272v1 on bioRxiv
Jun 17, 2024

The cortical scene processing network emerges in infancy, prior to independent navigation experience

This article has 7 authors:
1. Frederik S. Kamps
2. Emily M. Chen
3. Haoyu Du
4. Heather L. Kosakowski
5. Ariel Fuchs
6. Nancy Kanwisher
7. Rebecca Saxe
This article has no evaluationsLatest version Jun 16, 2025
Retinotopic scaffolding of high-level vision

This article has 3 authors:
1. Nicholas M. Blauch
2. Marlene Behrmann
3. David C. Plaut
This article has no evaluationsLatest version Jun 16, 2025
Retinotopic scaffolding of high-level vision

This article has 3 authors:
1. Nicholas M. Blauch
2. Marlene Behrmann
3. David C. Plaut
This article has no evaluationsLatest version Jun 16, 2025

Listed in

Abstract

Significance Statement

Article activity feed

Related articles

The cortical scene processing network emerges in infancy, prior to independent navigation experience

Retinotopic scaffolding of high-level vision

Retinotopic scaffolding of high-level vision