Mitigating text data privacy risks from gradient and model inversion attacks with a dual-pronged defense
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gradient inversion and model inversion attacks pose serious privacy risks to text data by recovering training samples during or after model training. Existing defenses mainly rely on perturbation or regularization, but most are tailored to a single attack setting, often incur unfavorable privacy-utility trade-offs, and may not directly apply to text data. To address these issues, we propose DGIMI (Defense against Gradient Inversion and Model Inversion), a defense framework that mitigates both attacks on text data while preserving model utility. Our framework is motivated by the fact that input information is encoded in intermediate representations, exposed in gradients during training, and gradually memorized in model parameters. Accordingly, DGIMI intervenes along the propagation of input information. Before training, DGIMI adopts parameter freezing with a pre-selection based on Fisher information analysis of parameter sensitivity to both task performance and inversion effectiveness. During training, DGIMI combines the intermediate representation and labels from multiple samples and injects perturbation on the combined representation. To further reduce utility loss, DGIMI introduces information entropy to identify privacy-sensitive representation dimensions, enabling targeted perturbation. The representation-level perturbation is well suited to language models, where directly perturbing discrete text inputs is less practical than manipulating continuous internal representations. Theoretical analysis shows that DGIMI increases the lower bounds of reconstruction losses for both attacks while maintaining convergence. Experiments on multiple language models and text datasets show that DGIMI reduces privacy leakage in both attack settings while retaining competitive task performance.