Deepfake Audio Detection Using Machine Learning and Deep Learning Methods

Mainul Islam

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deepfake audio has become a threat to the authenticity of audio recordings, the credibility of information sources, and the security of individuals and organizations. The detection of deepfake audio is very challenging due to the rapid advancements in artificial intelligence, which enable increasingly sophisticated techniques for generating highly realistic and deceptive audio content. In this study, we assess the effectiveness of machine learning, deep learning, and stacking ensemble method for the detection of deepfake audio speeches. We used a Bangla deepfake audio speech dataset in this experiment. We explored two feature representations of audio speech: Mel-frequency cepstral coefficients (MFCCs) and mel spectrogram. These features are very effective in capturing the spectral characteristics of audio signals that are well-suited for classification algorithms. We explored four classification algorithms in our experiment: SVM, KNN, BiLSTM, and GRU. Furthermore, we tuned the hyperparameter of the algorithms to improve the performance. Among the algorithms we implemented, GRU outperformed other algorithms with an accuracy of 99.67%.

Version published to 10.21203/rs.3.rs-8462841/v1 on Research Square
Jan 6, 2026

Environmental Sound Classification Using Feature Fusion of MFCCs, Mel-spectrogram, and Chroma

This article has 1 author:
1. Mainul Islam
This article has no evaluationsLatest version Jan 16, 2026
Fake Voice Detection: A Comparative Analysis of Complex-Valued Deep Learning and Transformer Models across Multiple Languages

This article has 5 authors:
1. Mario Jojoa
2. Alfonso Bahillo
3. Dávid Sztahó
4. Giovanni Hernandez
5. Géza Nemeth
This article has no evaluationsLatest version Feb 3, 2026
Self-Supervised Audio Representation Learning Model Based on Time-Frequency Decoupling and Masked Reconstruction

This article has 3 authors:
1. Jie Xu
2. Yuhao Dai
3. Zhifeng Wang
This article has no evaluationsLatest version Dec 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Environmental Sound Classification Using Feature Fusion of MFCCs, Mel-spectrogram, and Chroma

Fake Voice Detection: A Comparative Analysis of Complex-Valued Deep Learning and Transformer Models across Multiple Languages

Self-Supervised Audio Representation Learning Model Based on Time-Frequency Decoupling and Masked Reconstruction