Deepfake audio detection and justication with Explainable Articial Intelligence

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deepfake audio refers to synthetically generated audio, often used as legal hoaxes to impersonate human voices. This paper generates fake audio from Fake or Real (FoR) dataset using Generative Adversarial Neural Networks (GANs). FoR dataset has the advantage of a diversity of speakers across 195,000 samples. The proposed work analyses the quality of the generated fake data using the Fréchet Audio Distance (FAD) score. FAD evaluation score of 23.814 indicates good quality fake has been produced by the generator. The study further enables glass box analysis of deepfake audio detection through Explainable Artificial Intelligence (XAI) models of LIME, SHAP and GradCAM. This research assists in understanding impact of frequency bands in audio classification based on the quantitative analysis of SHAPLey values and qualitative comparison of explainability masks of LIME and GradCAM. The use of FAD metric provides a quantitative evaluation of generator performance. XAI and FAD metrics help in the development of deepfake audio through GANs with minimal data input. The results of this research are applicable to detection of phishing audio calls and voice impersonation.

Article activity feed