Experimental investigation of Deep Neural Networks inspired Supervised and Semi- Supervised Cocktail Party Problem based Speech Separation

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multiple speakers while communicating simultaneously with background noise is a difficult problem to address particularly in the modern multimedia world. Cocktail party problem (CPP), basically used to identify the target speaker from multiple speakers is the standard approach identified by several researchers. Traditional Speech separation processing methods such as non-negative matrix factorization (NMF) and Computational Auditory Scene Analysis (CASA) for single-channel processing and for multi-channel processing blind source separation (BSS), independent component analysis (ICA) is applied to CCP to address the issue. Speech separation using deep neural networks is a great area of research with the potential to significantly improve the area of speech processing. This work introduces a novel approach for speech separation processing in a single channel and to separate the speakers from a mixed speech signal. In this work, two approaches are proposed: one based on supervised learning and the other on unsupervised learning, both the approaches are compared based on perceptual evaluation of speech quality (PESQ), source-to- noise ratio (SI-SNRi), scale-invariant signal-to-distortion ratio improvement (SDRi) and signal to distortion ratio (STOI). This experiment is conducted on the TIMIT dataset. The data is mixed at SNRs ranging from − 5 dB to 5dB. In the proposed work the results have been analyzed using both objective and subjective analysis which gives efficacy of the work.

Article activity feed