Explainability-Driven Adversarial Robustness Assessment for Generalized Deepfake Detectors

Lorenzo Cirillo
Andrea Gervasio
Irene Amerini

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The capabilities of generative models to produce high-quality fake images require deepfake detectors to be accurate and have strong generalization performance. Moreover, the explainability and adversarial robustness of deepfake detectors are critical to apply such models in real-world scenarios. In this paper, we propose a framework that leverages explainability to assess the adversarial robustness of deepfake detectors. Specifically, we apply feature attribution methods to identify image regions where the model is focusing to make its prediction. Then we use the generated heatmaps to perform an explainability-driven attack, perturbing the most relevant and irrelevant regions with gradient-based adversarial techniques. We feed the model with the resulting adversarial images and measure the accuracy drop and the attack success rate. We tested our methodology on state-of-the-art models with strong generalization abilities, providing a comprehensive and explainability-driven evaluation of their robustness. Experimental results show the explainability analysis serves as a tool to reveal vulnerabilities of generalized deepfake detectors to adversarial attacks.

Version published to 10.21203/rs.3.rs-6473433/v1 on Research Square
Apr 30, 2025

Learning a More Expressive Ensemble with Alternate Propagating Strategy for Enhancing Robustness

This article has 1 author:
1. Jiachen Yu
This article has no evaluationsLatest version May 29, 2025
Real-Time Deepfake Detection Using a Hybrid Mobile Net-LSTM Model For Image and Video Analysis

This article has 3 authors:
1. S Ambika
2. B R Sumanth
3. Yerramsetty Harini
This article has no evaluationsLatest version May 9, 2025
LLM-as-Critic: Contrastive and Adversarial Strategies for Authentic Text Verification

This article has 2 authors:
1. Wei Chen
2. Dexin Chen
This article has no evaluationsLatest version Jun 3, 2025

Listed in

Abstract

Article activity feed

Related articles

Learning a More Expressive Ensemble with Alternate Propagating Strategy for Enhancing Robustness

Real-Time Deepfake Detection Using a Hybrid Mobile Net-LSTM Model For Image and Video Analysis

LLM-as-Critic: Contrastive and Adversarial Strategies for Authentic Text Verification