Examining the precision of infants’ visual concepts by leveraging vision-language models and automated gaze coding

Tarun Sepuri
Martin Zettersten
Bria Long

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Infants rapidly develop knowledge about the meanings of words in the first few years of life. Previous work has examined this word knowledge by measuring how much infants look at a named target image over a distractor. Here, we examine the specificity of that knowledge by manipulating the similarity of the target and distractor. We measured looking behavior in 91 14- to 24-month-old infants, enabled by automatic gaze annotation and online data collection. Using a vision-language model to quantify target-distractor image and text similarity, we find that infants’ looking behavior is shaped by the high-level visual similarity of competitors: infants’ looking to the target image was inversely correlated with image similarity but not with visual saliency. Our findings demonstrate how multimodal models can be used to systematically examine the content of infants’ early visual representations.

Version published to 10.31234/osf.io/p3mjt_v1 on OSF Preprints
Jun 10, 2025

Quantifying infants’ everyday experiences with objects in a large corpus of egocentric videos

This article has 5 authors:
1. Jane Yang
2. Tarun Sepuri
3. Alvin Wei Ming Tan
4. Michael C. Frank
5. Bria Long
This article has no evaluationsLatest version Jun 19, 2025
Knowledge differences affect gaze behavior during naturalistic object exploration

This article has 3 authors:
1. Amanda J Haskins
2. Tarun Sepuri
3. Bria Long
This article has no evaluationsLatest version Jun 10, 2025
Mapping and tracking the development of visual and semantic information use in object perception

This article has 4 authors:
1. Marelle Maeekalle
2. Inga María Ólafsdóttir
3. Brent Pitchford
4. Heida Maria Sigurdardottir
This article has no evaluationsLatest version Jul 9, 2025

Listed in

Abstract

Article activity feed

Related articles

Quantifying infants’ everyday experiences with objects in a large corpus of egocentric videos

Knowledge differences affect gaze behavior during naturalistic object exploration

Mapping and tracking the development of visual and semantic information use in object perception