Deepfake Audio Detection Using Machine Learning and Deep Learning Methods
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deepfake audio has become a threat to the authenticity of audio recordings, the credibility of information sources, and the security of individuals and organizations. The detection of deepfake audio is very challenging due to the rapid advancements in artificial intelligence, which enable increasingly sophisticated techniques for generating highly realistic and deceptive audio content. In this study, we assess the effectiveness of machine learning, deep learning, and stacking ensemble method for the detection of deepfake audio speeches. We used a Bangla deepfake audio speech dataset in this experiment. We explored two feature representations of audio speech: Mel-frequency cepstral coefficients (MFCCs) and mel spectrogram. These features are very effective in capturing the spectral characteristics of audio signals that are well-suited for classification algorithms. We explored four classification algorithms in our experiment: SVM, KNN, BiLSTM, and GRU. Furthermore, we tuned the hyperparameter of the algorithms to improve the performance. Among the algorithms we implemented, GRU outperformed other algorithms with an accuracy of 99.67%.