A Pipeline for Extracting Data from Videos of Complex Political Events
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Political scientists now regularly use audio, video, and text data to investigate questions about deliberation, representation, and emotion. However, existing research using multimodal data focuses largely on highly professionalized settings like national legislatures where pre-processed data is often available. Although scholars can increasingly access a wide range of data on less standardized environments such as campaign events, committee hearings, and local government meetings, using videos of these events for research poses a number of common measurement challenges, including low quality transcripts, missing speaker information, idiosyncratic production styles, and varying formats. In this paper, we present a streamlined pipeline using open-source tools to automatically extract text (e.g. transcription), audio (e.g. vocal features), and images (e.g. scene detection) from videos of complex political events. The outputs of our pipeline can then be readily used for a wide range of substantive analyses. We validate our approach through an examination of local government meetings in the United States, showing how we can accurately segment audio, identify speakers, transcribe speech, and detect gender in videos of varying structure and audio/video quality. As a demonstration, we examine participation by gender in over 1,000 hours of school board meeting videos.