Using Generative AI to create lexicons for interpretable text models with high content validity
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Researchers often want to measure a broad variety of constructs such as anxiety, discrimination, or loneliness in text data from surveys, academic articles, interviews, social media, and electronic health records. In practice, using large language models (LLMs) remain infeasible for many researchers due to concerns around privacy, cost, compute requirements, and computational expertise, and may prefer interpretable models to avoid mistakes in high-stakes scenarios. Lexicons offer a simpler solution by counting relevant words and phrases, making them very popular in the social sciences and as baseline models in computer science. Existing lexicons are limited because they measure a fixed set of constructs that are unlikely to cover the specific constructs of interest for a given application, resulting in low content validity. Building and validating new lexicons is resource intensive. In this study, we found that GPT-4 turbo was able to automatically create a lexicon for 49 known risk factors for suicidal thoughts and behaviors, which we release as the Suicide Risk Lexicon. We used the lexicon counts to predict risk levels in crisis counseling conversations. After validating it with experts, our model outperformed the widely-used LIWC lexicon and performed similarly to deep learning models. Through feature importance analysis, we discovered that active suicidal ideation, suicidal planning, and direct self-injury were stronger indicators of imminent risk than passive suicidal ideation or depressed mood. To simplify creating and validating new lexicons for any domain of research, we introduce a protocol and Python package, construct-tracker, that works with a variety of generative AI models.