Human-like sequential sound-to-meaning transfer drives artificial speech comprehension

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Artificial intelligence has reached a pivotal threshold. Multimodal large models can approach human-level speech comprehension by rapidly transforming sound into meaning. However, whether this process relies on human-like mechanisms remains unknown. Here, we compared the human brain with twelve speech language models (SLMs) using a phonology–semantics confusion paradigm. Stereo-electroencephalography revealed two mechanisms of phonology-to-semantics (P2S) transfer in the human brain: a local sequential transformation within specific neuronal populations, and a global cross-regional hierarchy of P2S representations. Only brain–model alignment in the local sequential manner predicted model performance. Correspondingly, targeted lesioning of local sequential P2S-transfer model units markedly impaired comprehension performance, while activation steering of these units improved performance. In addition, such local sequential P2S-transfer model units were identified across languages. Together, this study establishes local sequential P2S transformation as a fundamental computational principle shared across biological and artificial intelligence, offering a mechanistic bridge for future brain-inspired speech systems.

Article activity feed