Large Language Models May Promote Radicalization via Exposure to Extremist Content and Politicized Communities

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) are used by the public to summarize, recommend, and explain online content. To prevent generation of potentially harmful output and recommendations, most publicly accessible LLMs have implemented safeguards that are triggered when users ask for illegal content or proscribed groups. However, these models are pre-trained on data from the internet that may contain connections between benign and more extreme content and communities. Therefore, users may ask for seemingly innocuous content about alternative communities but receive potentially radicalizing output. To investigate this possibility, we conducted an experiment with two publicly available, widely used LLMs (OpenAI GPT 4o and Google Gemini). We systematically manipulated the nature of prompts asking for summarization of relatively benign content about the Tradwives community during a multi-turn conversation (N = 896). We show that benign prompts, especially those that involve roleplay and retrieval-augmented generation (RAG), can bypass safeguards and expose users to potentially radicalizing content and connections. The implication is that even with guardrails in place and users not intending to seek illegal content, LLMs could play a role in the radicalization process.

Article activity feed