Pruning and Malicious Injection: A Retraining Free Backdoor Attack on Transformer Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Transformer models remain vulnerable to backdoor attacks, yet existing backdoor attack methods typically require resource-intensive retraining or disruptive architectural modification of the target model. To address these limitations, we propose Head Pruning and Malicious Injection (HPMI), a retraining-free backdoor attack that preserves the target model's original architecture. This approach requires only a small subset of data and basic architectural knowledge, effectively eliminating the need for retraining. HPMI identifies and prunes the least significant attention head and surgically injects a pre-trained malicious head to establish a stealthy backdoor pathway. We provide a rigorous theoretical justification showing that HPMI is resistant to detection and removal by state-of-the-art defenses under reasonable assumptions. Experimental evaluations across multiple benchmarks validate HPMI’s effectiveness, showing that it incurs a negligible drop in clean accuracy, achieves an attack success rate exceeding 99.55%, and successfully bypasses state-of-the-art advanced defense mechanisms. Furthermore, compared with retraining-dependent baselines, HPMI achieves superior concealment and robustness while incurring minimal impact on model utility.