Mitigating text data privacy risks from gradient and model inversion attacks with a dual-pronged defense

Yuxin Xie
Ying Gao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Gradient inversion and model inversion attacks pose serious privacy risks to text data by recovering training samples during or after model training. Existing defenses mainly rely on perturbation or regularization, but most are tailored to a single attack setting, often incur unfavorable privacy-utility trade-offs, and may not directly apply to text data. To address these issues, we propose DGIMI (Defense against Gradient Inversion and Model Inversion), a defense framework that mitigates both attacks on text data while preserving model utility. Our framework is motivated by the fact that input information is encoded in intermediate representations, exposed in gradients during training, and gradually memorized in model parameters. Accordingly, DGIMI intervenes along the propagation of input information. Before training, DGIMI adopts parameter freezing with a pre-selection based on Fisher information analysis of parameter sensitivity to both task performance and inversion effectiveness. During training, DGIMI combines the intermediate representation and labels from multiple samples and injects perturbation on the combined representation. To further reduce utility loss, DGIMI introduces information entropy to identify privacy-sensitive representation dimensions, enabling targeted perturbation. The representation-level perturbation is well suited to language models, where directly perturbing discrete text inputs is less practical than manipulating continuous internal representations. Theoretical analysis shows that DGIMI increases the lower bounds of reconstruction losses for both attacks while maintaining convergence. Experiments on multiple language models and text datasets show that DGIMI reduces privacy leakage in both attack settings while retaining competitive task performance.

Version published to 10.21203/rs.3.rs-9243611/v1 on Research Square
Apr 7, 2026

DL-DPGAN: A Correlation-Regularized Differentially Private GAN for Privacy-Utility Balanced Synthetic Data Generation

This article has 3 authors:
1. Mohammad Emadi
2. Vahideh Moghtadaiee
3. Mina Alishahi
This article has no evaluationsLatest version Apr 14, 2026
Extended Counterfactual Adversarial Examples forMitigating Privacy Risk in Adversarially Robust Models

This article has 4 authors:
1. Aohan Sun
2. Yanrong Lu
3. Wencheng Yang
4. Ji Zhang
This article has no evaluationsLatest version Mar 19, 2026
Attacks and Defenses in Differentially Private Deep Learning: New Security Risks in New Era

This article has 4 authors:
1. Kaiyan Zhao
2. Zhe Sun
3. Lihua Yin
4. Tianqing Zhu
This article has no evaluationsLatest version Mar 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DL-DPGAN: A Correlation-Regularized Differentially Private GAN for Privacy-Utility Balanced Synthetic Data Generation

Extended Counterfactual Adversarial Examples forMitigating Privacy Risk in Adversarially Robust Models

Attacks and Defenses in Differentially Private Deep Learning: New Security Risks in New Era