Looked but didn’t see: inattentional blindness and yes-bias confabulation in vision-language models

Jonathan D. Raymond
Dat Duong
Ping Hu
Benjamin D. Solomon

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Previous work showed that many participants fail to notice a gorilla in a video of people playing basketball. Another study found that 83% of trained radiologists failed to report a gorilla figure inserted into a chest CT nodule-search task, even though eye-tracking revealed that most observers had foveated the figure. We ask whether a similar phenomenon exists in contemporary vision-language models (VLMs). We find that (i) VLMs are capable of spotting the gorilla in both still-frame images and videos of lung CT scans; (ii) models display inattentional blindness, which varies according to model generation and type of stimulus presented; (iii) Gemini-3.1-Pro outperforms most other flagship and open-weight VLMs at identifying the presence or absence of the gorilla. We additionally ran a segmentation experiment utilizing two different model classes: a generalist (SAM 3), which found the gorilla but produced little to no results for anatomy-based prompts; a medical specialist (BiomedParse), which produced more promising anatomy-based results but flagged “gorilla” on gorilla-free control videos on 82% of frames. The behavioral signature of inattentional blindness reproduces in VLMs, but a unique confabulation failure mode means that any “did the model see X” claim requires signal-detection analysis with a matched-control false-alarm baseline.

Version published to 10.64898/2026.06.16.26355792 on medRxiv
Jun 18, 2026

Hard-to-beat animacy perception: EEG evidence of accurate animate-inanimate distinction for ambiguous objects

This article has 5 authors:
1. Farzad Rostami
2. Céline Spriet
3. Hans Op de Beeck
4. Jean-Rémy Hochmann
5. Liuba Papeo
This article has no evaluationsLatest version Jun 18, 2026
Distinctly perceptual possibilities: Amodal completion is disrupted by visual, but not cognitive, load

This article has 2 authors:
1. Jonathan Scott Phillips
2. Viola S. Störmer
This article has no evaluationsLatest version Jul 7, 2026
Delaying the onset of aided target recognition highlights allows for a more dispersed allocation of overt attention

This article has 4 authors:
1. Chloe Callahan-Flintoft
2. Angela Jeter
3. Jessica L. Villarreal
4. Gabriella B. Larkin
This article has no evaluationsLatest version Jul 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Hard-to-beat animacy perception: EEG evidence of accurate animate-inanimate distinction for ambiguous objects

Distinctly perceptual possibilities: Amodal completion is disrupted by visual, but not cognitive, load

Delaying the onset of aided target recognition highlights allows for a more dispersed allocation of overt attention