Challenges for multilingual computational text analysis researchers: evidence from a survey of social scientists
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper investigates how English-centric developments of computational text analysis methods (CTAM) shape the experiences and practices of social science researchers working with English, non-English, and multilingual texts. Drawing on a survey of 433 scholars who published text-based research in top social science journals between 2016 and 2020, we examine concerns about CTAM’s validity and availability, use of validation strategies, and barriers to working with multilingual corpora. Our survey findings indicate that researchers working with text in multiple languages express greater concern about CTAM’s validity, but do not report using more validation strategies. Furthermore, researchers whose native language is not English are more likely to rely on English-language texts when using CTAM than when not. These findings illustrate a structural bias in tool development and resource availability that pushes computational research toward English-language data. We conclude by outlining practical steps toward a more inclusive and linguistically diverse computational social science.