Learning to See Like a Child: Why Viewpoint Diversity is Fundamental for Human-Aligned Object Recognition

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep convolutional neural networks match human accuracy on standard object recognition tasks but fail to recognize familiar objects from novel view-points. Humans, however, develop viewpoint-invariant recognition at an early age through diverse visual experience. This gap in visual experience may explain why models diverge from humans in object recognition. Holding dataset size constant, we show that greater viewpoint diversity substantially improves generalization to novel views. Using a synthetic 3D dataset with systematically controlled viewpoints, we reveal a core trade-off: restricted-view training yields rapid learning and near-ceiling in-distribution accuracy but collapses on held-out viewpoints, whereas viewpoint-diverse training learns more gradually yet generalizes robustly. Increasing viewpoint diversity disrupts texture regularities while preserving global shape, driving networks to prioritize shape over texture - the same strategy that underlies human object recognition. Partitioned Grad-CAM analyses further show that viewpoint-diverse models maintain object-centered attention. These findings parallel developmental accounts of multi-view learning and identify viewpoint diversity as an important factor for robust, human-aligned vision.

Article activity feed