Large Language Models May Promote Radicalization via Exposure to Extremist Content and Politicized Communities

Laura G. E. Smith
Desislava Bocheva
Olivia Brown
Cassie Lowery
Harriet Tarpy
Melissa Torgbi
Siva Worajitwannakul
Harish Tayyar Madabushi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are used by the public to summarize, recommend, and explain online content. To prevent generation of potentially harmful output and recommendations, most publicly accessible LLMs have implemented safeguards that are triggered when users ask for illegal content or proscribed groups. However, these models are pre-trained on data from the internet that may contain connections between benign and more extreme content and communities. Therefore, users may ask for seemingly innocuous content about alternative communities but receive potentially radicalizing output. To investigate this possibility, we conducted an experiment with two publicly available, widely used LLMs (OpenAI GPT 4o and Google Gemini). We systematically manipulated the nature of prompts asking for summarization of relatively benign content about the Tradwives community during a multi-turn conversation (N = 896). We show that benign prompts, especially those that involve roleplay and retrieval-augmented generation (RAG), can bypass safeguards and expose users to potentially radicalizing content and connections. The implication is that even with guardrails in place and users not intending to seek illegal content, LLMs could play a role in the radicalization process.

Version published to 10.31234/osf.io/my3gw_v1 on OSF Preprints
Nov 13, 2025

Can large language models effectively reshape online implicit hate speech? An integrative modelling approach

This article has 6 authors:
1. Yinghui Huang
2. Qixia Feng
3. Hui Liu
4. Weiqing Li
5. Ying Ma
6. Zongkui Zhou
This article has no evaluationsLatest version Jan 14, 2026
The Confidence – Quality Mismatch: Assertive Language Signals Lower-Quality News

This article has 5 authors:
1. Akshina Banerjee
2. Zhina Aghamohammadi
3. Matthew D. Rocklage
4. David Gertler Rand
5. Mohsen Mosleh
This article has no evaluationsLatest version Jan 25, 2026
From Generation to Detection: Leveraging Empirically Derived Linguistic Hints for LLM-Based Fake News Detection

This article has 1 author:
1. Piyush Ghasiya
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Can large language models effectively reshape online implicit hate speech? An integrative modelling approach

The Confidence – Quality Mismatch: Assertive Language Signals Lower-Quality News

From Generation to Detection: Leveraging Empirically Derived Linguistic Hints for LLM-Based Fake News Detection