Neural Construction of Temporal Hierarchies in Speech Processing.
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding spoken language requires the brain to transform continuous acoustic input into hierarchically organized structured units such as syllables, words, and phrases. Neural oscillations are known to align with these units, but it remains unclear whether this alignment predominantly reflects structured linguistic representations, statistical regularities, prosodic patterns, or a combination of these factors, let alone how they interact. Using magnetoencephalography (MEG), we systematically manipulated the availability of structural cues (e.g., prosodic, statistical and linguistic) in synthesized speech streams in both Dutch and Mandarin Chinese. We found that statistical regularities alone can elicit hierarchical neural tracking, but distinct cortical spatiotemporal dynamics emerged when additional types of cues (e.g., prosodic and linguistic) were present. Neural phase and power responses jointly reflected the type and strength of the cues, revealing how structured information sharpens the brain’s temporal alignment to speech, though phase and power dissociated in their relationship to structure and content. Based on these neural findings, we propose a theoretical model outlining how temporal hierarchies are constructed across time, frequency, and space. Bivariate analyses and encoding simulations further validated our model and clarified how different types of cues are represented and integrated over time. Together, our MEG and modeling results suggest that the brain engages a generalized mechanism for organizing perceptual units in speech into temporal hierarchies, but that cortical dynamics are sensitive to the type of information used, as reflected in coordinated changes in both phase and power across cortical regions depending on the cue.