Experimental investigation of Deep Neural Networks inspired Supervised and Semi- Supervised Cocktail Party Problem based Speech Separation

Jaipreet Kour Wazir
Javaid A Sheikh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multiple speakers while communicating simultaneously with background noise is a difficult problem to address particularly in the modern multimedia world. Cocktail party problem (CPP), basically used to identify the target speaker from multiple speakers is the standard approach identified by several researchers. Traditional Speech separation processing methods such as non-negative matrix factorization (NMF) and Computational Auditory Scene Analysis (CASA) for single-channel processing and for multi-channel processing blind source separation (BSS), independent component analysis (ICA) is applied to CCP to address the issue. Speech separation using deep neural networks is a great area of research with the potential to significantly improve the area of speech processing. This work introduces a novel approach for speech separation processing in a single channel and to separate the speakers from a mixed speech signal. In this work, two approaches are proposed: one based on supervised learning and the other on unsupervised learning, both the approaches are compared based on perceptual evaluation of speech quality (PESQ), source-to- noise ratio (SI-SNRi), scale-invariant signal-to-distortion ratio improvement (SDRi) and signal to distortion ratio (STOI). This experiment is conducted on the TIMIT dataset. The data is mixed at SNRs ranging from − 5 dB to 5dB. In the proposed work the results have been analyzed using both objective and subjective analysis which gives efficacy of the work.

Version published to 10.21203/rs.3.rs-8170302/v1 on Research Square
Dec 8, 2025

Self-Supervised Audio Representation Learning Model Based on Time-Frequency Decoupling and Masked Reconstruction

This article has 3 authors:
1. Jie Xu
2. Yuhao Dai
3. Zhifeng Wang
This article has no evaluationsLatest version Dec 31, 2025
Fake Voice Detection: A Comparative Analysis of Complex-Valued Deep Learning and Transformer Models across Multiple Languages

This article has 5 authors:
1. Mario Jojoa
2. Alfonso Bahillo
3. Dávid Sztahó
4. Giovanni Hernandez
5. Géza Nemeth
This article has no evaluationsLatest version Feb 3, 2026
Remote Optical Decoding of Inner Speech in Broca’s Area via AI-based Speckle Pattern Analysis

This article has 7 authors:
1. Natalya Segal
2. Moshe Bar
3. Daniel Rubinstein
4. Sergey Agdarov
5. Yafim Beiderman
6. Yevgeny Beiderman
7. Zeev Zalevsky
This article has no evaluationsLatest version Jan 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Self-Supervised Audio Representation Learning Model Based on Time-Frequency Decoupling and Masked Reconstruction

Fake Voice Detection: A Comparative Analysis of Complex-Valued Deep Learning and Transformer Models across Multiple Languages

Remote Optical Decoding of Inner Speech in Broca’s Area via AI-based Speckle Pattern Analysis