From Embeddings to Explainability: A Tutorial on Transformer-Based Text Analysis for Social and Behavioral Scientists

Rudolf Debelak
Timo Kevin Koch
Matthias Aßenmacher
Clemens Stachl

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models and their use for text analysis have had a significant impact on psychology and the social and behavioral sciences in general. Key applications include the analysis of texts, such as social media posts, to infer psychological characteristics, as well as survey and interview analysis. In this tutorial paper, we demonstrate the use of the Python-based natural language processing software package transformers (and related modules from the Hugging Face Ecosystem) that allow for the automated classification of text inputs in a practical exercise. In doing so, we rely on pretrained transformer models which can be fine-tuned to a specific task and domain. The first proposed application of this model class is to use it as a feature extractor, allowing for the transformation of written text into real-valued numerical vectors (called "embeddings") that capture a text's semantic meaning. These vectors can, in turn, be used as input for a subsequent machine-learning model. The second presented application of transformer models is the end-to-end training (so-called "fine-tuning") of the model. This results in a direct prediction of the label within the same model that directly maps the text to the embeddings. While in the second case, results are usually better and training works more seamlessly, the model itself is often not directly interpretable. We showcase an alleviation of this issue via the application of post-hoc interpretability methods by calculating SHAP values and applying local interpretable model-agnostic explanations (LIME) in an attempt to explain the model's inner workings.

Version published to 10.31234/osf.io/bc56a_v2 on OSF Preprints
Mar 18, 2025
Version published to 10.31234/osf.io/bc56a_v1 on OSF Preprints
May 31, 2024

A Word2vec-BERT Model for Enhanced Sentiment Analysis of Arabic Social Media

This article has 1 author:
1. Mohammed Maree
This article has no evaluationsLatest version Apr 10, 2025
From Codebooks to Promptbooks: Extracting Information from Text with Generative Large Language Models

This article has 3 authors:
1. Oscar Stuhler
2. Cat Dang Ton
3. Etienne Ollion
This article has no evaluationsLatest version Mar 19, 2025
Hybridly Explainable Verified Language Processing

This article has 3 authors:
1. Oliver Robert Fox
2. Giacomo Bergami
3. Graham Morgan
This article has no evaluationsLatest version Apr 2, 2025

Listed in

Abstract

Article activity feed

Related articles

A Word2vec-BERT Model for Enhanced Sentiment Analysis of Arabic Social Media

From Codebooks to Promptbooks: Extracting Information from Text with Generative Large Language Models

Hybridly Explainable Verified Language Processing