Pruning and Malicious Injection: A Retraining Free Backdoor Attack on Transformer Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Transformer models remain vulnerable to backdoor attacks, yet existing backdoor attack methods typically require resource-intensive retraining or disruptive architectural modification of the target model. To address these limitations, we propose Head Pruning and Malicious Injection (HPMI), a retraining-free backdoor attack that preserves the target model's original architecture. This approach requires only a small subset of data and basic architectural knowledge, effectively eliminating the need for retraining. HPMI identifies and prunes the least significant attention head and surgically injects a pre-trained malicious head to establish a stealthy backdoor pathway. We provide a rigorous theoretical justification showing that HPMI is resistant to detection and removal by state-of-the-art defenses under reasonable assumptions. Experimental evaluations across multiple benchmarks validate HPMI’s effectiveness, showing that it incurs a negligible drop in clean accuracy, achieves an attack success rate exceeding 99.55%, and successfully bypasses state-of-the-art advanced defense mechanisms. Furthermore, compared with retraining-dependent baselines, HPMI achieves superior concealment and robustness while incurring minimal impact on model utility.

Article activity feed