Human-like monocular depth biases in deep neural networks

Yuki Kubota
Taiki Fukiage

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Human depth perception from 2D images is systematically distorted, yet the nature of these distortions is not fully understood. To gain insights into this fundamental problem, we compare human depth judgments with those of deep neural networks (DNNs), which have shown remarkable abilities in monocular depth estimation. Using a novel human-annotated dataset of natural indoor scenes and a systematic analysis of absolute depth judgments, we investigate error patterns in both humans and DNNs. Employing exponential-affine fitting, we decompose depth estimation errors into depth compression, per-image affine transformations (including scaling, shearing, and translation), and residual errors. Our analysis reveals that human depth judgments exhibit systematic and consistent biases, including depth compression, a vertical bias (perceiving objects in the lower visual field as closer), and consistent per-image affine distortions across participants. Intriguingly, we find that DNNs with higher accuracy partially recapitulate these human biases, demonstrating greater similarity in affine parameters and residual error patterns. This suggests that these seemingly suboptimal human biases may reflect efficient, ecologically adapted strategies for depth inference from inherently ambiguous monocular images. However, while DNNs capture metric-level residual error patterns similar to humans, they fail to reproduce human-level accuracy in ordinal depth perception within the affine-invariant space. These findings underscore the importance of evaluating error patterns beyond raw accuracy, providing new insights into how humans and computational models resolve depth ambiguity. Our dataset and methodology provide a framework for evaluating the alignment between computational models and human perceptual biases, thereby advancing our understanding of visual space representation and guiding the development of models that more faithfully capture human depth perception.

Author summary

Understanding the characteristics of errors in depth judgments exhibited by humans and deep neural networks (DNNs) provides a foundation for developing functional models of human brain and artificial models with enhanced interpretability. To address this, we constructed a human depth judgment dataset using indoor photographs and compared human depth judgments with those of DNNs. Our results show that humans systematically compress far distances and exhibit distortions related to viewpoint shift, which remain remarkably consistent across observers. Strikingly, the better the DNNs were at depth estimation, the more they also exhibited human-like biases. This suggests that these seemingly suboptimal human biases could in fact reflect efficient strategies for inferring 3D structure from ambiguous 2D inputs. However, we also found a limit: while DNNs mimicked some human errors, they weren’t as good as humans at judging the relative order of objects in depth, especially when we accounted for viewpoint distortions. We believe that our dataset and discovery of multiple error factors will drive further comparative studies between humans and DNNs, facilitating model evaluations that go beyond simple accuracy to uncover how depth perception truly works—and how it might best be replicated in computational models.

Version published to 10.1101/2025.04.03.646971v1 on bioRxiv
Apr 3, 2025

Human gloss perception reproduced by tiny neural networks

This article has 7 authors:
1. Takuma Morimoto
2. Arash Akbarinia
3. Katherine R. Storrs
4. Jacob R. Cheeseman
5. Hannah E. Smithson
6. Karl R. Gegenfurtner
7. Roland W. Fleming
This article has no evaluationsLatest version May 15, 2025
Triple-N Dataset: Non-human Primate Neural Responses to Natural Scenes

This article has 11 authors:
1. Yipeng Li
2. Wei Jin
3. Jia Yang
4. Wanru Li
5. Baoqi Gong
6. Xieyi Liu
7. Zhengxin Gong
8. Kesheng Wang
9. Zishuo Zhao
10. Jingqiu Luo
11. Pinglei Bao
This article has no evaluationsLatest version May 11, 2025
The representational geometry of out‐of‐distribution generalization in primary visual cortex and artificial neural networks

This article has 2 authors:
1. Zeyuan Ye
2. Ralf Wessel
This article has no evaluationsLatest version Apr 28, 2025

Listed in

Abstract

Author summary

Article activity feed

Related articles

Human gloss perception reproduced by tiny neural networks

Triple-N Dataset: Non-human Primate Neural Responses to Natural Scenes

The representational geometry of out‐of‐distribution generalization in primary visual cortex and artificial neural networks