Evaluating the Effectiveness of Explainable AI for Adversarial Attack Detection in Traffic Sign Recognition Systems

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Connected autonomous vehicles (CAVs) rely on deep neural network-based perception systems to operate safely in complex driving environments. However, these systems remain vulnerable to adversarial perturbations that can induce misclassification without perceptible changes to human observers. Explainable artificial intelligence (XAI) has been proposed as a potential adversarial detection mechanism by exposing inconsistencies in model attention. This study evaluated the effectiveness of NoiseCAM-based explanation-space detection on the German Traffic Sign Recognition Benchmark (GTSRB) using a single 32 × 32 CNN architecture. Adversarial examples were generated using FGSM under perturbation budgets ϵ = 0.01–0.10, and detection performance was evaluated using accuracy, precision, recall, F1-score, and ROC–AUC. Results show that NoiseCAM achieves detection accuracies between 51.8% and 52.9% with ROC–AUC values of 0.52–0.53, only marginally above random discrimination (0.5). Class-wise analysis further reveals substantial variability in detection reliability across traffic sign categories, with visually structured regulatory signs exhibiting higher separability than complex warning signs. These findings suggest that explanation-space inconsistencies alone provide limited adversarial detection capability in low-resolution, safety-critical perception pipelines. The study contributes to the understanding of the operational limits of explanation-based adversarial detection and highlights the need to integrate XAI signals with complementary robustness or uncertainty-aware mechanisms for reliable deployment in autonomous driving systems.

Article activity feed