Size Matters(?): Utilizing Small LLMs for Annotation in Social Science
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Annotation is often a time-intensive and costly aspect of social sciences research utilizing natural language data. Recent advances in large language models (LLM) and general pretrained transformers promise new methods for quick and easy annotation but often rely on commercial APIs or cloud services that introduce costs, limit researcher control, and raise concerns about privacy. Bias from training data introduces further issues for this approach. This paper investigates the feasibility of LLM annotation using small (less than 14B parameters) models executed on consumer-grade hardware, further investigating potential issues of model bias. The study examines binary topic annotation task quality for 6 different models, two different topics, and two different historical periods on political speeches from the German Bundestag between 1949–2025. Standard metrics, including F1 scores, are calculated against a human-annotated gold standard. Results indicate that most models tested achieve strong performances with F1 scores ranging from 0.7 to 0.9 for both topic annotation tasks, with the annotation of discussions of abortion generally surpassing the annotation of economic topic mentions. Performance varies systematically for the sample origin time, with annotation quality being higher for older speeches. The findings suggest that small, locally executed LLMs can serve for low-cost annotation tasks while also highlighting the need to account for topic, period, and model-specific bias when crafting a studies research design utilizing LLM annotation.