Quantifying infants’ everyday experiences with objects in a large corpus of egocentric videos

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

While modern vision-language models are typically trained on millions of curated photographs, infants learn visual categories and the words that refer to them from very different training data. Here, we investigate which objects infants actually encounter in their everyday environments, and how often they encounter them. We use a large corpus of egocentric videos taken from the infant perspective (N = 868 hours, N = 31 participants), applying and validating a recent object detection model (YOLOE) to detect a set of categories that are frequently named in children’s early vocabulary. We find that infants’ visual experience is dominated by a small set of objects, with differences in individual children’s home environments driving variability. We also find that young children tend to learn words earlier for more frequently encountered categories. These results suggest that visual experience scaffolds young children’s early category and language learning and highlight that ecologically valid computational models of category learning must be able to accommodate skewed input distributions.

Article activity feed