Using large language models to track national identities

Stefan Leach
Rebecca Gregson
Jan Nikadon
Magdalena Formanowicz
Chiara Zazzarino
Andrew Kitchin
Michal Kosinski
Jay Joseph Van Bavel
Aleksandra Cichocka

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

National identity drives a range of political action, from civic engagement to intergroup violence. We present a novel and accessible approach using large language models (LLMs) to label texts for expressions of positive (national identification, patriotism) and defensive (nationalism, national narcissism) national identities. We test four popular LLMs (GPT-4o, Claude Sonnet 4.5, Gemini 2.5 Pro, and Llama 4 Maverick) across 13 million words sourced from social media, surveys, and political speeches in 25 languages. Our findings reveal that LLMs label texts reliably and in a theoretically-consistent manner, producing judgments that closely match those of domain experts and outperform both dictionary-based approaches and crowdworkers—also reducing the cost by a factor of 1,000 compared to the latter. An analysis of US presidential addresses using the best-performing LLM (GPT-4o) reveals that expressions of national identities have doubled over the twentieth century. Further studies revealed differences between Republicans and Democrats. Defensive national identities were five times more prevalent in Republicans' social media posts than in Democrats'. Such identities were also frequent in the speeches of populist leaders around the globe. Together, these findings demonstrate that LLMs offer a reliable, valid, accessible, and cost-effective approach to labeling texts for nuanced expressions of national identity, enabling new insights into its role in contemporary and historical social trends.

Version published to 10.31234/osf.io/9jcvr_v1 on OSF Preprints
Mar 8, 2026

Network Structure, Addressivity, and Civility in Networked Publics

This article has 2 authors:
1. Meryl Ye
2. Patrick Park
This article has no evaluationsLatest version Apr 9, 2026
The Retreat from DEI: Language Change on Large U.S. Foundation Websites, 2019–2025

This article has 5 authors:
1. Chad M. Topaz
2. Benham Cobb
3. Frances Lu
4. Aniya Smith
5. Nathan Zekarias
This article has no evaluationsLatest version Mar 2, 2026
How Different Discussion Formats Shape Political Disagreement Between Citizens: Comparing Low and High Cue Interactions

This article has 3 authors:
1. Lea Pradella
2. Steffen Borch Selmer
3. Michael Bang Petersen
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Network Structure, Addressivity, and Civility in Networked Publics

The Retreat from DEI: Language Change on Large U.S. Foundation Websites, 2019–2025

How Different Discussion Formats Shape Political Disagreement Between Citizens: Comparing Low and High Cue Interactions