Mixed Perturbation: Generating Directionally Diverse Perturbations for Adversarial Training
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The adversarial vulnerability of deep learning models poses a significant challenge to the safe commercialization of AI technologies. Although numerous adversarial defenses have been proposed, most offer limited robustness, emphasizing the need for continued exploration of the properties and causes of adversarial vulnerabilities. In this study, we hypothesize that the phenomenon of adversarially trained models exhibiting low adversarial accuracies is due to insufficient exploration and learning from adversarial examples that exist on the manifold. In this regard, we propose a novel perturbation generation method, “mixed perturbation (MP),” which aims to discover various adversarial examples for adversarial training. The proposed method generates perturbations by leveraging information from both the main task and auxiliary tasks, combining them through a random weighted summation. This approach preserves the primary directionality of the main task perturbation while introducing variability in perturbation directions, enabling the discovery of diverse adversarial examples from a defensive perspective. Extensive experiments on five benchmark datasets show that the non-optimized MP surpasses existing AT methods in several settings, while the optimized MP consistently achieves the highest robustness. We further analyze perturbation diversity, conduct ablation studies to explain MP’s effectiveness. In addition, through combination experiments with a state-of-the-art AT method, we confirmed the promising potential of MP in enhancing model robustness and outlined directions for future research.