Research on Citizen Privacy Risk Assessment Method Based on Retrieval-Augmented Generation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The issue of severe infringement on citizens' personal privacy makes it particularly crucial to assess the severity of personal information leaks. Current privacy risk assessment methods suffer from low evaluation efficiency and accuracy. To address these challenges, this paper proposes a research framework for privacy risk assessment based on Retrieval-Augmented Generation (RAG). First, we analyze privacy risk factors in cases of personal information infringement, construct an evaluation indicator system, and create a fine-tuned dataset and knowledge graph. Next, we propose a fine-tuning method dynamically adjusting the LoRA rank, which automatically contracts matrix dimensions by monitoring loss/gradient changes, reducing GPU memory consumption while maintaining generation quality. Finally, we introduce Retrieval-Augmented Generation (RAG), integrating internal knowledge from fine-tuned LLMs with weighted external evidence through joint reasoning to achieve more reliable judgments and privacy risk assessments. Compared to traditional approaches, this method demonstrates improved accuracy and text matching, validating its effectiveness. We propose a novel privacy risk assessment method based on Retrieval-Augmented Generation, mitigating the hallucination issues of large language models while enhancing the accuracy and efficiency of privacy risk evaluation tasks.