Phonological representations of auditory and visual speech in the occipito-temporal cortex and beyond

Alice Van Audenhaege
Stefania Mattioni
Filippo Cerpelloni
Remi Gau
Arnaud Szmalec
Olivier Collignon

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Speech is a multisensory signal that can be extracted from the voice and the lips. Previous studies suggested that occipital and temporal regions encode both auditory and visual speech features but their precise location and nature remain unclear. We characterized brain activity using fMRI (in male and female) to functionally and individually define bilateral Fusiform Face Areas (FFA), the left Visual Word Form Area (VWFA), an audio-visual speech region in the left Superior Temporal Sulcus (lSTS) and control regions in bilateral Para-hippocampal Place Areas (PPA). In these regions, we performed multivariate patterns classification of corresponding phonemes (speech sounds) and visemes (lip movements). We observed that the VWFA and lSTS represent phonological information from both vision and sounds. The multisensory nature of phonological representations appeared selective to the anterior portion of VWFA, as we found viseme but not phoneme representation in adjacent FFA or even posterior VWFA, while PPA did not encode phonology in any modality. Interestingly, cross-modal decoding revealed aligned phonological representations across the senses in lSTS, but not in VWFA. A whole-brain cross-modal searchlight analysis additionally revealed aligned audio-visual phonological representations in bilateral pSTS and left somato-motor cortex overlapping with oro-facial articulators. Altogether, our results demonstrate that auditory and visual phonology are represented in the anterior VWFA, extending its functional coding beyond orthography. The geometries of auditory and visual representations do not align in the VWFA as they do in the STS and left somato-motor cortex, suggesting distinct multisensory representations across a distributed phonological network.

Significance statement

Speech is a multisensory signal that can be extracted from the voice and the lips. Which brain regions encode both visual and auditory speech representations? We show that the Visual Word Form Area (VWFA) and the left Superior Temporal Sulcus (lSTS) both process phonological information from speech sounds and lip movements. However, while the lSTS aligns these representations across the senses, the VWFA does not, indicating different encoding mechanisms. These findings extend the functional role of the VWFA beyond reading. An additional whole-brain approach reveals shared representations in bilateral superior temporal cortex and left somato-motor cortex, indicating a distributed network for multisensory phonology.

Version published to 10.1101/2024.07.25.605084v1 on bioRxiv
Jul 26, 2024

Integration of Audiovisual Motion in Dorsolateral Prefrontal Cortical Neurons

This article has 7 authors:
1. Alireza Karimi
2. Rana Mozumder
3. Adriana Schoenhaut
4. Oscar Rausis
5. Mark Wallace
6. Ramnarayan Ramachandran
7. Christos Constantinidis
This article has no evaluationsLatest version Jun 12, 2025
Neural coding of spectrotemporal modulations in the auditory cortex supports speech and music categorization

This article has 8 authors:
1. Jérémie Ginzburg
2. Émilie Cloutier Debaque
3. Arthur Borderie
4. Benjamin Morillon
5. Laurence Martineau
6. Paule Lessard Bonaventure
7. Robert J Zatorre
8. Philippe Albouy
This article has no evaluationsLatest version Jun 7, 2025
Multifaceted brain representation of numerosity across the senses and presentation formats

This article has 8 authors:
1. Ying Yang
2. Michele Fornaciai
3. Irene Togoli
4. Iqra Shahzad
5. Remi Gau
6. Alice Van Audenhaege
7. Filippo Cerpelloni
8. Olivier Collignon
This article has no evaluationsLatest version Jun 16, 2025

Listed in

Abstract

Significance statement

Article activity feed

Related articles

Integration of Audiovisual Motion in Dorsolateral Prefrontal Cortical Neurons

Neural coding of spectrotemporal modulations in the auditory cortex supports speech and music categorization

Multifaceted brain representation of numerosity across the senses and presentation formats