Improving Transferability of Adversarial Examples with Mixed-Representation Attack
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Although deep neural networks (DNNs) have achieved remarkable performance in the image classification task, they remain highly vulnerable to adversarial examples, which are crafted by adding human-imperceptible perturbations to benign samples. An important aspect is their transferability, which refers to the ability to deceive target black-box models, enabling attacks in the black-box setting to assess and understand the robustness of DNNs. Recently, various methods have been proposed to boost the adversarial transferability, among which the input transformation is one of the most effective approaches. We observe that most existing methods in this direction perform geometric transformations in the spatial domain, ignoring the potential transformation in the latent space, which may limit the transferability of adversarial examples. To tackle this issue, we propose a novel Mixed-Representation Attack ( MRA ) to augment the input diversity by exploiting transformations on latent representations. Specifically, MRA leverages a Variational Autoencoder to generate representations of the input image and images randomly sampled from different categories, and then reconstruct images based on the mixed representations. Instead of directly computing the average gradient over the reconstructed images, MRA calculates the gradient on the original input mixed with each reconstructed image to generate more transferable adversaries. Extensive experiments on the ImageNet-compatible dataset demonstrate that our MRA achieves state-of-the-art transferability, significantly outperforming various input transformation attacks. Source code will be released in https://github.com/unclelongheu/Mixed-Representation-Attack