Speech Separation and Enhancement using Deep Neural Networks

Fatemeh Sheikhaboli
Mostafa Esmaeilbeig
Mahsa Vaghefi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In this paper, two methods were proposed to address the speech enhancement in reverberant environment. In first method, the geometric information is utilized to provide the position information of the target speaker and microphone to estimate the direct-path impulse response, which is used to calculate the direct-path speech. Based on the direct-path speech, the DRM was calculated, which is a new training target. The experimental results confirmed the DRM outperforms the state-of-the-art method. LSTM was introduced to solve the speech enhancement problem with the speaker-independent case in real reverberant room environments. Two T-F masks were trained separately in the LSTM models to solve speech enhancement tasks. The proposed method was evaluated with independent signals and real RIRs to confirm its generalization ability. The experimental results prove the proposed LSTMs method outperforms state-of-the-art DNNs method. In next chapter, a multi-scale CNN will be provided to capture the features in different scales.

Version published to 10.21203/rs.3.rs-7739460/v1 on Research Square
Oct 25, 2025

Crosstalk Suppression in a Multi-Channel, Multi-Speaker System Using Acoustic Vector Sensors

This article has 1 author:
1. Grzegorz Szwoch
This article has no evaluationsLatest version Sep 30, 2025
Learning Emotional Nuances in Speech via DCNNs and Spectral Feature Integration

This article has 5 authors:
1. K. Venkatesh Sharma
2. Pramod Reddy
3. Rakesh Betala
4. Madhavi Pappula
5. Shirisha Reddy K
This article has no evaluationsLatest version Sep 2, 2025
RAE-NeRF: Residual-Based Audio-Video Encoder with Denoising in Talking Head Synchronization

This article has 6 authors:
1. Wengang Pang
2. Xiang Li
3. Taotao Tang
4. Weihua Wu
5. Xinyu Chang
6. Lin Zhang
This article has no evaluationsLatest version Sep 26, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Crosstalk Suppression in a Multi-Channel, Multi-Speaker System Using Acoustic Vector Sensors

Learning Emotional Nuances in Speech via DCNNs and Spectral Feature Integration

RAE-NeRF: Residual-Based Audio-Video Encoder with Denoising in Talking Head Synchronization