A brain-inspired algorithm enhances automatic speech recognition performance in multi-talker scenes

Alexander D. Boyd
Kamal Sen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Modern automatic speech recognition (ASR) systems are capable of impressive performance recognizing clean speech but struggle in noisy, multi-talker environments, commonly referred to as the “cocktail party problem.” In contrast, many human listeners can solve this problem, suggesting the existence of a solution in the brain. Here we present a novel approach that uses a brain inspired sound segregation algorithm (BOSSA) as a preprocessing step for a state-of-the-art ASR system (Whisper). We evaluated BOSSA’s impact on ASR accuracy in a spatialized multi-talker scene with one target speaker and two competing maskers, varying the difficulty of the task by changing the target-to-masker ratio. We found that median word error rate improved by up to 54% when the target-to-masker ratio was low. Our results indicate that brain-inspired algorithms have the potential to considerably enhance ASR accuracy in challenging multi-talker scenarios without the need for retraining or fine-tuning existing state-of-the-art ASR systems.

Version published to 10.1101/2025.07.15.664627 on bioRxiv
Jul 16, 2025

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

This article has 4 authors:
1. Eloise Schell
2. Tzu-Han Zoe Cheng
3. Yi Shen
4. T. Christina Zhao
This article has no evaluationsLatest version Jan 6, 2026
Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

This article has 4 authors:
1. Eloise Schell
2. Tzu-Han Zoe Cheng
3. Yi Shen
4. T. Christina Zhao
This article has no evaluationsLatest version Jan 6, 2026
Remote Optical Decoding of Inner Speech in Broca’s Area via AI-based Speckle Pattern Analysis

This article has 7 authors:
1. Natalya Segal
2. Moshe Bar
3. Daniel Rubinstein
4. Sergey Agdarov
5. Yafim Beiderman
6. Yevgeny Beiderman
7. Zeev Zalevsky
This article has no evaluationsLatest version Jan 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

Rhythm modulates perception and neural tracking of speech in a speech-in-noise task

Remote Optical Decoding of Inner Speech in Broca’s Area via AI-based Speckle Pattern Analysis