Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper, we aim to understand the security vulnerabilities associated with LLM-based multi-agent systems. We design a novel two-stage attack methodology, consisting of Persuasiveness Injection and Manipulated Knowledge Injection, to investigate the potential for manipulated knowledge to permeate trusted third-party platforms without explicit adversarial prompt attack. Through extensive experiments, we demonstrate that our attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge without degrading their foundational capabilities during agent communication. Furthermore, we show that these manipulations can persist through popular retrieval-augmented generation (RAG) frameworks, where several benign agents store and retrieve manipulated chat histories for future interactions. To mitigate the potential risk, we propose two defense strategies by designing system prompts to encourage agents to critically verify the knowledge they share and incorporating supervisory agents to oversee interactions. Results demonstrate that those strategies can effectively reduce the spread success rate. This work highlights the critical vulnerabilities inherent in LLM-based multi-agent communities and calls for the urgent need for platforms to swiftly adopt targeted defenses against manipulated knowledge spread. Results demonstrate that even these straightforward strategies can significantly reduce the spread success rate, which calls for platforms to swiftly adopt targeted, cost-effective measures to prevent the spread of manipulated knowledge.