Characterizing young children’s everyday activities using video question-answering models

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Children are remarkably efficient learners compared to our most advanced computational models of learning. One key difference is that children seem to leverage regularities in the activities (e.g., eating) in which they participate to learn about words or objects (e.g., “pomegranate”), even under skewed, long-tailed distributions. While everyday activities have long been theorized to be important as supports for children’s learning, our understanding of the types, frequencies, and rhythms of these activities has been out of reach due to both a lack of naturalistic video datasets and the necessity for manual annotations. Here, we use the recent release of a large, egocentric dataset of children’s everyday experience (BabyView) (N=31 children, N=868 hours) and capitalize on innovations in video question-answering (VideoQA) models to quantify the what and where of children’s everyday experiences. Using these models, we classify both the activities (e.g., eating, dancing, exploring) and physical locations (e.g., living room, garage) in the infant view and generate natural-language descriptions for contiguous 10-second videos across the entire dataset. Notably, we find that (a) some activities and locations occur much more frequently than others, yet (b) there is wide variation across children. Moreover, (c) activities and locations exhibit structured transition probabilities (e.g., cooking often precedes eating), and (d) may decompose into distinct sub-clusters (e.g., different subtypes of reading). Compared with prior work analyzing static image content, our work highlights the advances possible by using VideoQA models to analyze the dynamic nature of children’s experiences. Our results provide a better understanding of children’s learning input in everyday contexts, informing developmentally-inspired models of early learning and cognitive development.

Article activity feed