A primer for the use of classifier and generative large language models in social science research

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The emergence of generative AI models is rapidly changing the social sciences. Much has now been written on the ethics and epistemological considerations of using these tools. Meanwhile, AI-powered research increasingly makes its way to preprint servers. However, we see a gap between ethics and practice: while many researchers would like to use these tools, few if any guides on how to do so exist. This paper fills this gap by providing users with a hands-on application written in accessible language. The paper deals with what we consider the most likely and advanced use case for AI in the social sciences: text annotation and classification. Our application guides readers through setting up a text classification pipeline and evaluating the results. The most important considerations concern reproducibility and transparency, open-source versus closed-source models, as well as the difference between classifier and generative models. The take-home message is this: these models provide unprecedented scale to augment research, but the community must take seriousely open-source and locally deployable models in the interest of open science principles. Our code to reproduce the example can be accessed via Github.

Article activity feed