BadInterpreter: Backdoor Attack on LLM-based Interpretable Recommendation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large Language Models (LLMs) has promoted miscellaneous models and downstream applications, driving the progress of LLM agents by enhancing their ability to comprehend and generate interpretable reasoning. Recently, the security of LLM agents has become an increasingly popular research topic, where backdoor attacks show potential devastation by injecting a covert backdoor to manipulate the output. Our findings show that LLM agents fine-tuned for recommendation tasks are particularly vulnerable to the embedding of imperceptible backdoors, even when recommendation explanations are required. We introduce BadInterpreter, a simple yet effective backdoor attack for LLM-based interpretable recommendation systems, enabling attackers to manipulate product recommendations and explanations without altering ground-truth labels. In interpretable recommendation, LLM agents are asked to provide explanations for product recommendations to meet user needs. We propose a novel LLM-based pipeline to construct poisoned fine-tuning data, where the agent is expected to recommend the target product with rational recommendation explanations. Attacked by our BadInterpreter, LLM agents prioritize recommending the target products whose information contains attacker-designed triggers in a dynamic interactive environment, along with convincing explanations. Our attack consistently achieves robust attack success rates exceeding 94% on two benchmark e-shopping datasets with four distinct LLMs. While backdoor attacks represent a well-explored threat in natural language processing models, their application and impact within the specific context of LLM-based interpretable recommendation systems remain largely uncharted territory. To our knowledge, this study pioneers the investigation of such vulnerabilities in this critical domain. Our work reveals that constructing LLM-based recommendation systems on untrusted LLMs poses a severe threat.

Article activity feed