The brain predicts visual speech units during naturalistic audiovisual speech listening

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Predictive processing is fundamental to language comprehension, yet research has primarily focused on auditory mechanisms, emphasizing how listeners anticipate upcoming phonemes and words based on the acoustic and linguistic structure of speech. However, speech is inherently multimodal and, here, we show that language prediction is likewise inherently multimodal. We recorded EEG while participants watched naturalistic continuous speech videos with the speaker’s mouth visible or covered to investigate whether the brain proactively predicts visual speech units (visemes) and how visual cues influence high-level linguistic prediction. Our results reveal that visemes are actively predicted beyond low level mouth movement, with neural signatures emerging before visual articulation. Additionally, access to visual speech enhances semantic prediction, as reflected in stronger N400 responses. These findings demonstrate that speech comprehension relies on a hierarchical multimodal predictive architecture, integrating visual and auditory linguistic information. This framework advances understanding of how the brain efficiently processes natural communication by dynamically incorporating visual cues to optimize linguistic expectations.

Article activity feed