Evaluating the Transferability of Adversarial Robustness to Target Domains
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Knowledge transfer is an effective method for learning, particularly useful when labeled data is limited or when training a model from scratch is too expensive. Most of the research on transfer learning focuses on achieving \emph{accurate} models, overlooking the crucial aspect of adversarial robustness. However, ensuring robustness is vital, especially when applying transfer learning in safety-critical domains. We compare robustness of models obtained by 11 training procedures on source domains and 3 retraining schemes on target domains, including normal, adversarial, contrastive and Lipschitz constrained training variants. Robustness is analyzed by adversarial attacks with respect to two different transfer learning model outputs: (i) the latent representations and (ii) the predictions. Studying latent representations in correlation with predictions is crucial for robustness of transfer learning models, since they are solely learned on the source domain. Besides adversarial attacks that aim at changing the prediction, we also analyze the effect of directly attacking representations. Our results show that adversarial robustness can transfer across domains, but effective robust transfer learning requires techniques that ensure robustness independent of the training data to preserve them during the transfer. Retraining on the target domain has a minor impact on the robustness of the target model. Representations exhibit greater robustness compared to predictions across both the source and target domain.