Towards Advanced Speech Signal Processing: A Statistical Perspective on Convolution-Based Architectures and It’s Applications

Kapu Nirmal Joshua
Raghav Karan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This article surveys convolution-based models convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models and provide their statistical backgrounds and speech recognition, speaker identification, emotion recognition, and speech enhancement applications. Through comparative training cost assessment, model size, accuracy and speed assessment, we compare the strengths and weaknesses of each model, identify potential errors and propose avenues for further research, emphasising the central role it plays in advancing applications of speech technologies.

Version published to 10.20944/preprints202406.1105.v3
Jan 6, 2025
Version published to 10.20944/preprints202406.1105.v2
Oct 31, 2024
Version published to 10.20944/preprints202406.1105.v1
Jun 17, 2024

Fake Voice Detection: A Comparative Analysis of Complex-Valued Deep Learning and Transformer Models across Multiple Languages

This article has 5 authors:
1. Mario Jojoa
2. Alfonso Bahillo
3. Dávid Sztahó
4. Giovanni Hernandez
5. Géza Nemeth
This article has no evaluationsLatest version Feb 3, 2026
Deepfake Audio Detection Using Machine Learning and Deep Learning Methods

This article has 1 author:
1. Mainul Islam
This article has no evaluationsLatest version Jan 6, 2026
Remote Optical Decoding of Inner Speech in Broca’s Area via AI-based Speckle Pattern Analysis

This article has 7 authors:
1. Natalya Segal
2. Moshe Bar
3. Daniel Rubinstein
4. Sergey Agdarov
5. Yafim Beiderman
6. Yevgeny Beiderman
7. Zeev Zalevsky
This article has no evaluationsLatest version Jan 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Fake Voice Detection: A Comparative Analysis of Complex-Valued Deep Learning and Transformer Models across Multiple Languages

Deepfake Audio Detection Using Machine Learning and Deep Learning Methods

Remote Optical Decoding of Inner Speech in Broca’s Area via AI-based Speckle Pattern Analysis