Dynamics of auditory word form encoding in human speech cortex

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

When we hear continuous speech, we perceive it as a series of discrete words, despite the lack of clear boundaries in the acoustic signal. The superior temporal gyrus (STG) encodes phonetic elements like consonants and vowels, but how it extracts whole words as perceptual units remains unclear. Using high-density cortical recordings, we investigated how the brain represents auditory word forms—integrating acoustic-phonetic, prosodic, and lexical features—while participants listened to spoken narratives. Our results show that STG neural populations exhibit a distinctive reset in activity at word boundaries, marked by a brief, sharp drop in cortical activity. Between these resets, the STG consistently encodes distinct acoustic-phonetic, prosodic, and lexical information, supporting the integration of phonological features into coherent word forms. Notably, this process tracks the relative elapsed time within each word, independent of its absolute duration, providing a flexible temporal scaffolding for encoding variable word lengths. We observed similar word form dynamics in the deeper layers of a self-supervised artificial speech network, suggesting a potential convergence with computational models. Additionally, in a bistable word perception task, STG responses were aligned with participants’ perceived word boundaries on a trial-by-trial basis, further emphasizing the role of dynamic encoding in word recognition. Together, these findings support a new dynamical model of auditory word forms, highlighting their importance as perceptual units for accessing linguistic meaning.

Article activity feed