Mitigation via Adaptive Decomposition (MAD): Geometry-Guided Subspace Decomposition for Robust Backdoor Defense
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Backdoor attacks embed hidden trigger-label associations into deep neural networks, enabling adversarial control while maintaining clean accuracy. Existing post-training defenses face a critical limitation: global methods such as Sharpness-Aware Minimization fail against adaptive attackers that flatten the loss landscape. At the same time, structural approaches sacrifice accuracy through coarse-grained modifications. We propose Mitigation via Adaptive Decomposition (MAD), a post-training defense that leverages the low-rank geometric structure that backdoor triggers imprint in weight space. MAD is based on the observation that backdoor triggers concentrate in low-rank subspaces orthogonal to semantic features. Our method identifies this malicious subspace through eigenanalysis of clean-data gradient covariance, then applies sharpness-aware optimization exclusively within the estimated subspace. Empirically validated across vision benchmarks under adaptive threat models across five architectures and five attacks, including adaptive variants, MAD reduces attack success rates to ≤ 5.12% while preserving clean accuracy within 0.4% of baseline (p ≤ 1.05×10⁻⁶, n = 30). Ablation studies validate the low-rank hypothesis (TSR ≈ 0.98 at K = 10), and scaling experiments identify a calibration density threshold of 0.21×10⁻⁴ clean-samples-to-parameters for robust defense across model scales.