Cultural Alignment and Biases in Qualitative Coding: Comparing GPT and Human Coders

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Qualitative coding—the interpretive labeling of complex data—is shaped by the cultural background of the coder. As large language models (LLMs) become increasingly integrated into qualitative research in education, there is growing concern around whether their inferences will capture the culturally grounded meanings that human coders can bring to qualitative analysis. We compared coding decisions by six trained researchers from three cultural backgrounds (China, India, United States) with the decisions produced by GPT-4o under two prompting conditions: baseline and culture-specific. Using the same codebook, coders annotated 200 student observations from a U.S. game-based learning environment. Drawing on emic and etic frameworks, we measured within-culture, cross-culture, and human-model agreement, and analyzed coder rationales and coding divergences using thematic analysis. Our results show that while within-culture agreement was high, cross-culture agreement was low, reflecting divergent interpretive frames. Further, GPT-4o aligned more closely with U.S. coders than with Chinese or Indian coders (even when prompted to adopt a Chinese or Indian cultural perspective) and often struggled with implicit or context-dependent cultural markers. Beyond these patterns, our analysis also surfaced four key challenges that coders faced when interpreting and coding culture: (1) Unfamiliar Cultural Phenomenon, (2) Cultural Blind Spots, (3) Cultural Projection, and (4) Choosing the Cultural Lens. These results underscore the need for culturally reflective practices when incorporating LLMs into qualitative workflows in educational research.

Article activity feed