Automated information extraction from text variables in event datasets with large language models

Laura Braun
Christian Oswald

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The advent of conflict event data initiated various research agendas analyzing subnational conflict processes. Event data are mostly confined to information on the type of violence and actors, the number of casualties, time, and location. However, the text sources on which they are based contain much more information. Some datasets provide summaries for individual observations. Using abductions and forced disappearances in the Armed Conflict Location and Event Data, we generate additional variables from this text variable to enable applications previously out of reach for researchers and demonstrate that large language models can extract additional information about events with an accuracy of over 90 percent. Our proposed approach can be easily adjusted or extended to different outcomes and variables of interest.

Version published to 10.31235/osf.io/yxp8k_v3 on OSF Preprints
Aug 5, 2025
Version published to 10.31235/osf.io/yxp8k_v2 on OSF Preprints
May 14, 2025
Version published to 10.31235/osf.io/yxp8k_v1 on OSF Preprints
Mar 23, 2025

Text as Data for Crisis-Early Warning: A Comparative Assessment of NLP Methods for Conflict Prediction

This article has 1 author:
1. Julian Walterskirchen
This article has no evaluationsLatest version Dec 23, 2025
Exploration of Large Language Models forGeotagging of Social Media Posts

This article has 2 authors:
1. Riwaz Udas
2. Richard Sinnott
This article has no evaluationsLatest version Feb 3, 2026
Issues in Using News Accounts in Process Analysis of Protest Episodes

This article has 3 authors:
1. Pamela Elaine Oliver
2. Chaeyoon Lim
3. Anna Milewski
This article has no evaluationsLatest version Jan 17, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Text as Data for Crisis-Early Warning: A Comparative Assessment of NLP Methods for Conflict Prediction

Exploration of Large Language Models forGeotagging of Social Media Posts

Issues in Using News Accounts in Process Analysis of Protest Episodes