Phonological Representations of Auditory and Visual Speech in the Occipito-temporal Cortex and Beyond

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Speech is a multisensory signal that can be extracted from the voice and the lips. Previous studies suggested that occipital and temporal regions encode both auditory and visual speech features but their location and nature remain unclear. We characterized brain activity using fMRI (13 males and 11 females) to functionally and individually define bilateral fusiform face areas (FFA), the left word-selective ventral occipito-temporal cortex (word-VOTC), an audiovisual speech region in the left superior temporal sulcus (lSTS); and control regions in bilateral scene-selective parahippocampal place areas (PPA). In these regions, we performed multivariate pattern classification of corresponding phonemes (speech sounds) and visemes (lip movements). We observed that the word-VOTC and lSTS represent phonological information from both vision and sounds. The multisensory nature of phonological representations appeared selective to the word-VOTC, as we found viseme but not phoneme representation in adjacent FFA, while PPA did not encode phonology in any modality. Interestingly, cross-modal decoding revealed aligned phonological representations across the senses in lSTS, but not in word-VOTC. A whole-brain cross-modal searchlight analysis additionally revealed aligned audiovisual phonological representations in bilateral pSTS and left somato-motor cortex overlapping with oro-facial articulators. Altogether, our results demonstrate that auditory and visual phonology are represented in the word-VOTC, extending its functional coding beyond orthography. The geometries of auditory and visual representations do not align in the word-VOTC as they do in the STS and left somato-motor cortex, suggesting distinct representations across a distributed multisensory phonological network.

Article activity feed