Speech Separation and Enhancement using Deep Neural Networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In this paper, two methods were proposed to address the speech enhancement in reverberant environment. In first method, the geometric information is utilized to provide the position information of the target speaker and microphone to estimate the direct-path impulse response, which is used to calculate the direct-path speech. Based on the direct-path speech, the DRM was calculated, which is a new training target. The experimental results confirmed the DRM outperforms the state-of-the-art method. LSTM was introduced to solve the speech enhancement problem with the speaker-independent case in real reverberant room environments. Two T-F masks were trained separately in the LSTM models to solve speech enhancement tasks. The proposed method was evaluated with independent signals and real RIRs to confirm its generalization ability. The experimental results prove the proposed LSTMs method outperforms state-of-the-art DNNs method. In next chapter, a multi-scale CNN will be provided to capture the features in different scales.