Learning to See Like a Child: Why Viewpoint Diversity is Fundamental for Human-Aligned Object Recognition

Yifan Luo
Niklas Müller
H. Steven Scholte

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep convolutional neural networks match human accuracy on standard object recognition tasks but fail to recognize familiar objects from novel view-points. Humans, however, develop viewpoint-invariant recognition at an early age through diverse visual experience. This gap in visual experience may explain why models diverge from humans in object recognition. Holding dataset size constant, we show that greater viewpoint diversity substantially improves generalization to novel views. Using a synthetic 3D dataset with systematically controlled viewpoints, we reveal a core trade-off: restricted-view training yields rapid learning and near-ceiling in-distribution accuracy but collapses on held-out viewpoints, whereas viewpoint-diverse training learns more gradually yet generalizes robustly. Increasing viewpoint diversity disrupts texture regularities while preserving global shape, driving networks to prioritize shape over texture - the same strategy that underlies human object recognition. Partitioned Grad-CAM analyses further show that viewpoint-diverse models maintain object-centered attention. These findings parallel developmental accounts of multi-view learning and identify viewpoint diversity as an important factor for robust, human-aligned vision.

Version published to 10.64898/2026.05.27.727784 on bioRxiv
May 27, 2026

Exposure to naturalistic occlusion promotes generalized, human-like robustness in deep neural networks

This article has 2 authors:
1. David D Coggan
2. Frank Tong
This article has no evaluationsLatest version Apr 27, 2026
Pretraining Objective Shapes Cross-Category Generalization in Affective Image Prediction: A Geometric Comparison of Vision Transformer Encoders

This article has 8 authors:
1. Shohei Tsuchimoto
2. Yuka O Okazaki
3. Kenichi Yuasa
4. Sakura Nishijima
5. Mebuki Izumiya
6. Makoto Hagihara
7. Ryo Fujihira
8. Keiichi Kitajo
This article has no evaluationsLatest version May 13, 2026
Prior scene context reshapes feature reliance during rapid perception

This article has 4 authors:
1. Sule Tasliyurt-Celebi
2. Benjamin de Haas
3. Melissa L.-H. Võ
4. Katharina Dobs
This article has no evaluationsLatest version May 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Exposure to naturalistic occlusion promotes generalized, human-like robustness in deep neural networks

Pretraining Objective Shapes Cross-Category Generalization in Affective Image Prediction: A Geometric Comparison of Vision Transformer Encoders

Prior scene context reshapes feature reliance during rapid perception