Quantifying infants’ everyday experiences with objects in a large corpus of egocentric videos

Jane Yang
Tarun Sepuri
Alvin Wei Ming Tan
Michael C. Frank
Bria Long

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

While modern vision-language models are typically trained on millions of curated photographs, infants learn visual categories and the words that refer to them from very different training data. Here, we investigate which objects infants actually encounter in their everyday environments, and how often they encounter them. We use a large corpus of egocentric videos taken from the infant perspective (N = 868 hours, N = 31 participants), applying and validating a recent object detection model (YOLOE) to detect a set of categories that are frequently named in children’s early vocabulary. We find that infants’ visual experience is dominated by a small set of objects, with differences in individual children’s home environments driving variability. We also find that young children tend to learn words earlier for more frequently encountered categories. These results suggest that visual experience scaffolds young children’s early category and language learning and highlight that ecologically valid computational models of category learning must be able to accommodate skewed input distributions.

Version published to 10.31234/osf.io/jqmf3_v1 on OSF Preprints
Jun 19, 2025

Examining the precision of infants’ visual concepts by leveraging vision-language models and automated gaze coding

This article has 3 authors:
1. Tarun Sepuri
2. Martin Zettersten
3. Bria Long
This article has no evaluationsLatest version Jun 10, 2025
Scene and Heard: Spatial layout and language support young infants’ categorization of places

This article has 4 authors:
1. Yi Lin
2. Agata Bochynska
3. Daniel D. Dilks
4. Moira Rose Dillon
This article has no evaluationsLatest version Jun 18, 2025
Knowledge differences affect gaze behavior during naturalistic object exploration

This article has 3 authors:
1. Amanda J Haskins
2. Tarun Sepuri
3. Bria Long
This article has no evaluationsLatest version Jun 10, 2025

Listed in

Abstract

Article activity feed

Related articles

Examining the precision of infants’ visual concepts by leveraging vision-language models and automated gaze coding

Scene and Heard: Spatial layout and language support young infants’ categorization of places

Knowledge differences affect gaze behavior during naturalistic object exploration