Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

Tianjie Ju
Yiting Wang
Yi Hua
Xinbei Ma
Pengzhou Cheng
Haodong Zhao
Yulong Wang
Lifeng Liu
Jian Xie
Zhuosheng Zhang
Gongshen Liu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper, we aim to understand the security vulnerabilities associated with LLM-based multi-agent systems. We design a novel two-stage attack methodology, consisting of Persuasiveness Injection and Manipulated Knowledge Injection, to investigate the potential for manipulated knowledge to permeate trusted third-party platforms without explicit adversarial prompt attack. Through extensive experiments, we demonstrate that our attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge without degrading their foundational capabilities during agent communication. Furthermore, we show that these manipulations can persist through popular retrieval-augmented generation (RAG) frameworks, where several benign agents store and retrieve manipulated chat histories for future interactions. To mitigate the potential risk, we propose two defense strategies by designing system prompts to encourage agents to critically verify the knowledge they share and incorporating supervisory agents to oversee interactions. Results demonstrate that those strategies can effectively reduce the spread success rate. This work highlights the critical vulnerabilities inherent in LLM-based multi-agent communities and calls for the urgent need for platforms to swiftly adopt targeted defenses against manipulated knowledge spread. Results demonstrate that even these straightforward strategies can significantly reduce the spread success rate, which calls for platforms to swiftly adopt targeted, cost-effective measures to prevent the spread of manipulated knowledge.

Version published to 10.21203/rs.3.rs-5292520/v1 on Research Square
Nov 1, 2024

An AI-Enabled Zero‑Trust Framework for Security Validation Platforms

This article has 3 authors:
1. Prashant Vajpayee
2. Binod Tandan
3. Gahangir Hossain
This article has no evaluationsLatest version Apr 14, 2026
NatBDI: Combining BDI Reasoning and Natural Language Inference for Autonomous Agents

This article has 3 authors:
1. Alexandre Yukio Ichida
2. Felipe Meneguzzi
3. Rafael C. Cardoso
This article has no evaluationsLatest version Apr 10, 2026
TWON social media: a scalable MERN-Stack platform for experimental research in online social networks

This article has 9 authors:
1. Abdul Sittar
2. Michael Heseltine
3. Francois t’Serstevens
4. Natan Viteznik
5. Corinna Oschatz
6. Mateja Smiljanic
7. Alenka Gucek
8. Damian Trilling
9. Marko Grobelnik
This article has no evaluationsLatest version Apr 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An AI-Enabled Zero‑Trust Framework for Security Validation Platforms

NatBDI: Combining BDI Reasoning and Natural Language Inference for Autonomous Agents

TWON social media: a scalable MERN-Stack platform for experimental research in online social networks