Deepfake Audio Detection Using Machine Learning and Deep Learning Methods

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deepfake audio has become a threat to the authenticity of audio recordings, the credibility of information sources, and the security of individuals and organizations. The detection of deepfake audio is very challenging due to the rapid advancements in artificial intelligence, which enable increasingly sophisticated techniques for generating highly realistic and deceptive audio content. In this study, we assess the effectiveness of machine learning, deep learning, and stacking ensemble method for the detection of deepfake audio speeches. We used a Bangla deepfake audio speech dataset in this experiment. We explored two feature representations of audio speech: Mel-frequency cepstral coefficients (MFCCs) and mel spectrogram. These features are very effective in capturing the spectral characteristics of audio signals that are well-suited for classification algorithms. We explored four classification algorithms in our experiment: SVM, KNN, BiLSTM, and GRU. Furthermore, we tuned the hyperparameter of the algorithms to improve the performance. Among the algorithms we implemented, GRU outperformed other algorithms with an accuracy of 99.67%.

Article activity feed