Beyond category-supervision: instance-level contrastive learning models predict human visual system responses to objects

Talia Konkle
George A. Alvarez

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

@MariusPeelen's saved articles (MariusPeelen)

Abstract

Anterior regions of the ventral visual stream have substantial information about object categories, prompting theories that category-level forces are critical for shaping visual representation. The strong correspondence between category-supervised deep neural networks and ventral stream representation supports this view, but does not provide a viable learning model, as these deepnets rely upon millions of labeled examples. Here we present a fully self-supervised model which instead learns to represent individual images, where views of the same image are embedded nearby in a low-dimensional feature space, distinctly from other recently encountered views. We find category information implicitly emerges in the feature space, and critically that these models achieve parity with category-supervised models in predicting the hierarchical structure of brain responses across the human ventral visual stream. These results provide computational support for learning instance-level representation as a viable goal of the ventral stream, offering an alternative to the category-based framework that has been dominant in visual cognitive neuroscience.

Version published to 10.1101/2021.05.28.446118 on bioRxiv
May 30, 2021

The Semantic Scaffold: Functional Dissociation of Visual and Language-derived Features Shapes Human Natural Scene Understanding

This article has 9 authors:
1. Yu Zhang
2. Yuxuan Tu
3. Zihan Yin
4. Jing Zhang
5. Weiyang Shi
6. Siyang Li
7. Jingguo Dai
8. Yongfu Hao
9. Tianzi Jiang
This article has no evaluationsLatest version Jan 12, 2026
Reassessing Multimodal Pathways for Learning Action Meaning

This article has 4 authors:
1. Bastien Morel
2. Anaïs Coppens
3. Elodie Fairchild
4. Mathieu Hoorde
This article has no evaluationsLatest version Dec 22, 2025
A neurocomputational framework of cross-context generalization: dynamic representational geometry in high-level visual cortex and dual coding in vmPFC

This article has 5 authors:
1. Huiguang He
2. Bincheng Wen
3. Chuncheng Zhang
4. Changde Du
5. Le Chang
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Semantic Scaffold: Functional Dissociation of Visual and Language-derived Features Shapes Human Natural Scene Understanding

Reassessing Multimodal Pathways for Learning Action Meaning

A neurocomputational framework of cross-context generalization: dynamic representational geometry in high-level visual cortex and dual coding in vmPFC