Hearing in categories aids speech streaming at the “cocktail party”

Gavin M. Bidelman
Fallon Bernard
Kimberly Skubic

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Our perceptual system bins elements of the speech signal into categories to make speech perception manageable. Here, we aimed to test whether hearing speech in categories (as opposed to a continuous/gradient fashion) affords yet another benefit to speech recognition: parsing noisy speech at the “cocktail party.” We measured speech recognition in a simulated 3D cocktail party environment. We manipulated task difficulty by varying the number of additional maskers presented at other spatial locations in the horizontal soundfield (1-4 talkers) and via forward vs. time-reversed maskers, promoting more and less informational masking (IM), respectively. In separate tasks, we measured isolated phoneme categorization using two-alternative forced choice (2AFC) and visual analog scaling (VAS) tasks designed to promote more/less categorical hearing and thus test putative links between categorization and real-world speech-in-noise skills. We first show that listeners can only monitor up to ∼3 talkers despite up to 5 in the soundscape and streaming is not related to extended high-frequency hearing thresholds (though QuickSIN scores are). We then confirm speech streaming accuracy and speed decline with additional competing talkers and amidst forward compared to reverse maskers with added IM. Dividing listeners into “discrete” vs. “continuous” categorizers based on their VAS labeling (i.e., whether responses were binary or continuous judgments), we then show the degree of IM experienced at the cocktail party is predicted by their degree of categoricity in phoneme labeling; more discrete listeners are less susceptible to IM than their gradient responding peers. Our results establish a link between speech categorization skills and cocktail party processing, with a categorical (rather than gradient) listening strategy benefiting degraded speech perception. These findings imply figure-ground deficits common in many disorders might arise through a surprisingly simple mechanism: a failure to properly bin sounds into categories.

Version published to 10.1101/2024.04.03.587795 on bioRxiv
Apr 5, 2024

Reverse-Engineering Speech and Music Categorization from a Single Sound Source

This article has 5 authors:
1. Lauren K Fink
2. Madita Hörster
3. David Poeppel
4. Melanie Wald-Fuhrmann
5. Pauline Larrouy-Maestri
This article has no evaluationsLatest version Jan 25, 2026
Perceptual learning and sensorimotor adaptation with cochlear-implant simulated speech feedback

This article has 4 authors:
1. Abigail Bradshaw
2. Clément Gaultier
3. Susie Black
4. Matthew H. Davis
This article has no evaluationsLatest version Dec 16, 2025
Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

This article has 4 authors:
1. Eloise Schell
2. Tzu-Han Zoe Cheng
3. Yi Shen
4. T. Christina Zhao
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reverse-Engineering Speech and Music Categorization from a Single Sound Source

Perceptual learning and sensorimotor adaptation with cochlear-implant simulated speech feedback

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task