Narrative Context Shifts Gaze from Visual to Semantic Salience
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Humans make over a hundred thousand eye movements daily to gather visual information. But what determines where we look? Current computational models typically link gaze behaviour to visual features of isolated images, but we know that eye movements are also strongly shaped by cognitive goals: Observers gather information that helps them to understand, rather than just represent, the world. Within this framework, observers should focus more on information that updates one’s understanding of the environment, and less on what is purely visually salient. Here we tested this hypothesis using a free-viewing paradigm of narratives where we experimentally manipulated the meaningfulness of temporal context by either presenting pictures in a coherent, i.e. correct, order, or in a temporally shuffled order. We developed a novel approach to quantify which visual information is semantically salient (i.e., important for understanding): we separately obtained language narratives for images in stories, and computed the contextual surprisal of visual objects using a large language model. The ability of this semantic saliency model in explaining gaze behaviour was compared to a state-of-the art model of visual saliency (DeepGaze-II). We found that individuals looked relatively more often and more quickly at semantically salient objects when images were presented in coherent compared to shuffled order. In contrast, visual salience did not better account for gaze behaviour in coherent than shuffled order. These findings highlight how internal contextual models guide visual sampling and demonstrate that language models could offer a powerful tool for capturing gaze behavior in richer, meaningful settings.