Human-like sequential sound-to-meaning transfer drives artificial speech comprehension
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Artificial intelligence has reached a pivotal threshold. Multimodal large models can approach human-level speech comprehension by rapidly transforming sound into meaning. However, whether this process relies on human-like mechanisms remains unknown. Here, we compared the human brain with twelve speech language models (SLMs) using a phonology–semantics confusion paradigm. Stereo-electroencephalography revealed two mechanisms of phonology-to-semantics (P2S) transfer in the human brain: a local sequential transformation within specific neuronal populations, and a global cross-regional hierarchy of P2S representations. Only brain–model alignment in the local sequential manner predicted model performance. Correspondingly, targeted lesioning of local sequential P2S-transfer model units markedly impaired comprehension performance, while activation steering of these units improved performance. In addition, such local sequential P2S-transfer model units were identified across languages. Together, this study establishes local sequential P2S transformation as a fundamental computational principle shared across biological and artificial intelligence, offering a mechanistic bridge for future brain-inspired speech systems.