Crosstalk Suppression in a Multi-Channel, Multi-Speaker System Using Acoustic Vector Sensors

Grzegorz Szwoch

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automatic speech recognition in a scenario with multiple speakers in a reverberant space, such as a small courtroom, often requires multiple sensors. This leads to a problem of crosstalk that must be removed before the speech-to-text transcription is performed. The proposed method uses Acoustic Vector Sensors to acquire audio streams. Speaker detection is performed using statistical analysis of the direction of arrival. This information is then used to perform source separation. Next, speakers’ activity in each channel is analyzed, and signal fragments containing direct speech and crosstalk are identified. Crosstalk is then suppressed using a dynamic gain processor, and the resulting audio streams may be passed to a speech recognition system. The algorithm was evaluated using a custom set of speech recordings. An increase in SI-SDR value over the unprocessed signal was achieved: 7.54 dB and 19.53 dB for the algorithm with and without the source separation stage, respectively. The algorithm is intended for application in multi-speaker scenarios requiring speech-to-text transcription, such as court sessions or conferences.

Version published to 10.20944/preprints202509.2532.v1
Sep 30, 2025

Speech Separation and Enhancement using Deep Neural Networks

This article has 3 authors:
1. Fatemeh Sheikhaboli
2. Mostafa Esmaeilbeig
3. Mahsa Vaghefi
This article has no evaluationsLatest version Oct 25, 2025
Robust Decoding of Speech Acoustics from EEG: Going Beyond the Amplitude Envelope

This article has 3 authors:
1. Alexis Deighton MacIntyre
2. Clement Gaultier
3. Tobias Goehring
This article has no evaluationsLatest version Oct 7, 2025
Mixed Signal Design Using C++ and DSP Acceleration for Low Latency and Secure Speech Systems

This article has 5 authors:
1. Ethan J. Turner
2. Grace L. Mitchell
3. Liam R. Scott
4. Chloe A. Wright
5. Isabella M. Parker
This article has no evaluationsLatest version Oct 9, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Speech Separation and Enhancement using Deep Neural Networks

Robust Decoding of Speech Acoustics from EEG: Going Beyond the Amplitude Envelope

Mixed Signal Design Using C++ and DSP Acceleration for Low Latency and Secure Speech Systems