A methodological tutorial in Python for automated content analysis of digital videos using artificial intelligence
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background . The exponential growth of short social media videos has created new opportunities for research. Nevertheless, traditional video content analysis remains labor-intensive and therefore difficult to scale. This methodological paper provides a practical step-by-step tutorial for conducting automated content analysis of digital videos using a multimodal large language model (LLM, Gemini 3 Pro) via an application programming interface (API). Methods . Using Python in a cloud notebook environment, we demonstrate how to (1) collect a public dataset of TikTok videos, (2) upload videos to Google API Files, (3) apply a codebook-based prompt to extract structured variables, (4) enforce the outputs to a JSON template, (5) implement robust error handling and reprocessing logic, and (6) export results for statistical analysis. The tutorial is illustrated with an open dataset of 1,028 TikTok videos on weight loss, yielding one JSON record per video that includes video description, topic classification, and identification of explicit weight-loss product advertising, plus additional attributes (e.g., framing, identity, narrative type, call to action) when advertising is detected. Results . The full run produced 1,028 JSON files in 11.39 hours at a cost of USD $20.27 dollars. Human–LLM coding agreement, assessed on a random subset using Krippendorff’s alpha, was high (mean 94.87%). Conclusion . The provided Python code and results demonstrate that the method employed here is very useful and can be escalated to analyze thousands if not hundreds of thousands of short digital videos.