Using Large Language Models for Text Annotation in Social Science and Humanities: A Hands-On Python/R Tutorial

Qixiang Fang
Javier Garcia-Bernardo
Erik-Jan van Kesteren

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) have become an essential tool for social scientists and humanities (SSH) researchers who work with textual data. One particularly valuable use case is automating text annotation, traditionally a time-consuming step in preparing data for empirical analysis. Yet, many SSH researchers face two challenges: getting started with LLMs, and understanding how to evaluate and correct for their limitations. The rapid pace of model development can make LLMs appear inaccessible or intimidating, while even experienced users may overlook how annotation errors can bias results from downstream analyses (e.g., regression estimates, $p$-values), even when accuracy appears high. This tutorial provides a step-by-step, hands-on guide to using LLMs for text annotation in SSH research for both Python and R users. We cover (1) how to choose and access LLM APIs, (2) how to design and run annotation tasks programmatically, (3) how to evaluate annotation quality and iterate on prompts, (4) how to integrate annotations into statistical workflows while accounting for uncertainty, and (5) how to manage cost, efficiency, and reproducibility. Throughout, we provide concrete examples, code snippets, and best-practice checklists to help researchers confidently and transparently incorporate LLM-based annotation into their workflows.

Version published to 10.31235/osf.io/v4eq6_v1 on OSF Preprints
Nov 13, 2025

Large Language Models (LLMs) for Evidence Synthesis: An Exploratory Evaluation and A New Approach for Automated Data Extraction

This article has 10 authors:
1. Yuchen Zhang
2. Nanyu Luo
3. Hajung Kim
4. Linxin Li
5. Linfeng Gao
6. Jiayi Han
7. Shiting Chen
8. Xiaoya Zhang
9. Jinbo He
10. Feng Ji
This article has no evaluationsLatest version Oct 16, 2025
Large Language Models for Accessible Reporting of Bioinformatics Analyses in Interdisciplinary Contexts

This article has 14 authors:
1. Lijia Yu
2. Daniel Kim
3. Yue Cao
4. Matthew Wei Shun Shu
5. Maya Shen
6. Xiaoqi Liang
7. Jasmine Gu
8. Rojashree Jayakumar
9. Wenze Ding
10. Fei Yang
11. Xumou Zhang
12. Jinman Kim
13. Pengyi Yang
14. Jean Yee Hwa Yang
This article has no evaluationsLatest version Nov 11, 2025
Large Language Models Robustness Against Perturbation

This article has 4 authors:
1. Saeed S. Alahmari
2. Lawerence Hall
3. Peter R. Mouton
4. Dmitry Goldgof
This article has no evaluationsLatest version Oct 9, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Large Language Models (LLMs) for Evidence Synthesis: An Exploratory Evaluation and A New Approach for Automated Data Extraction

Large Language Models for Accessible Reporting of Bioinformatics Analyses in Interdisciplinary Contexts

Large Language Models Robustness Against Perturbation