Automated information extraction from text variables in event datasets with large language models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The advent of conflict event data initiated various research agendas analyzing subnational conflict processes. Event data are mostly confined to information on the type of violence and actors, the number of casualties, time, and location. However, the text sources on which they are based contain much more information. Some datasets provide summaries for individual observations. Using abductions and forced disappearances in the Armed Conflict Location and Event Data, we generate additional variables from this text variable to enable applications previously out of reach for researchers and demonstrate that large language models can extract additional information about events with an accuracy of over 90 percent. Our proposed approach can be easily adjusted or extended to different outcomes and variables of interest.