Automated information extraction from text variables in event datasets with large language models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The advent of conflict event data initiated various research agendas analyzing subnational conflict processes. Event data are mostly confined to information on the type of violence and actors, the number of casualties, time, and location. However, the text sources on which they are based contain much more information. Some datasets provide summaries for individual observations. Using abductions and forced disappearances in the Armed Conflict Location and Event Data, we generate additional variables from this text variable to enable applications previously out of reach for researchers and demonstrate that large language models can extract additional information about events with an accuracy of over 90 percent. Our proposed approach can be easily adjusted or extended to different outcomes and variables of interest.

Article activity feed