YouPol: A Collaborative Research Infrastructure and Database for Political Content on YouTube and TikTok
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents YouPol (YouTube and TikTok Political Observatory and Longitudinal database), a permanently updated research infrastructure that captures what political content creators actually say on video platforms. As of April 2026 and continuously expanding, the corpus comprises 25,397 videos from 68 channels across France and Quebec, with full speaker-diarized transcripts (645,738 segments, 3.18 million annotated sentences) and 7.7 million archived comments. The infrastructure includes an independent transcription pipeline that produces high-quality transcripts regardless of platform-provided captions, and an LLM-in-the-loop annotation framework built on the open-source LLM Tool platform (Lemor et al., 2025) that can train sentence-level classifiers for any research project, with current projects covering political content detection, far-right ideology, gendered rhetoric, and neo-reactionary discourse. To produce transcription and metadata updates in real time, YouPol also introduces the YouPol Collaborative Computing Network (YCCN), which allows any collaborating researcher to contribute processing capacity from their own machine, freeing the observatory from dependence on institutional computing clusters. YouPol addresses four gaps in the literature: (1) the ideological substance of political video content remains empirically inaccessible through metadata alone; (2) content deletion and deplatforming erase material before researchers can study it; (3) longitudinal engagement dynamics are underexploited; and (4) no existing dataset preserves comments over time or tracks their deletion. The observatory has already preserved 2,305 videos and three entirely deleted channels that are no longer available on the platform. The dataset and API are available at https://data.you-pol.com/.