Hierarchical Emergence Profiles of Human-Derived Dimensions are a Fundamental Property of Deep Neural Networks

Florian Burger
Manuel Varlet
Genevieve L. Quek
Tijl Grootswagers

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Object recognition in the human visual system is implemented within a hierarchy characterised by increasing feature complexity. Here, we investigated whether human-derived dimensions of object knowledge show a similar progressive emergence across layers in deep neural networks (DNNs), and how this emergence is shaped by architecture, learning objective, and stimulus statistics. To test this, we predicted human-derived dimensions from layer-wise activations of multiple DNNs and transformer models trained on large-scale datasets. Results showed that trained DNNs exhibit emergence profiles resembling theoretical expectations from human vision, with behaviourally relevant object dimensions largely absent in early layers, strengthening across layers, and peaking in later layers. Architectural mechanisms such as recurrence and skip connections amplified this encoding, learning objectives redistributed information across layers, and changes in stimulus statistics confirm that hierarchical emergence is a general principle extending to material perception. These findings demonstrate that the hierarchical emergence of human-derived dimensions is a fundamental property of trained networks and highlight design and input factors that shape layer-wise representational organisation, providing hypotheses for the structure of visual representations in the brain.

Version published to 10.1101/2025.05.19.654977 on bioRxiv
May 21, 2025

The Semantic Scaffold: Functional Dissociation of Visual and Language-derived Features Shapes Human Natural Scene Understanding

This article has 9 authors:
1. Yu Zhang
2. Yuxuan Tu
3. Zihan Yin
4. Jing Zhang
5. Weiyang Shi
6. Siyang Li
7. Jingguo Dai
8. Yongfu Hao
9. Tianzi Jiang
This article has no evaluationsLatest version Jan 12, 2026
Multimodal Vision Language Models in Interactive and Physical Environments

This article has 4 authors:
1. Lucas Pereira
2. Martina Kovács
3. Ahmed El-Masry
4. Feidlimid Shyama
This article has no evaluationsLatest version Dec 26, 2025
Reassessing Multimodal Pathways for Learning Action Meaning

This article has 4 authors:
1. Bastien Morel
2. Anaïs Coppens
3. Elodie Fairchild
4. Mathieu Hoorde
This article has no evaluationsLatest version Dec 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Semantic Scaffold: Functional Dissociation of Visual and Language-derived Features Shapes Human Natural Scene Understanding

Multimodal Vision Language Models in Interactive and Physical Environments

Reassessing Multimodal Pathways for Learning Action Meaning