Sentiment Analysis of Naturalistic Speech Using Open-Weight Large Language Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Psychological research increasingly relies on computational methods to track emotion in naturalistic text. However, standard lexicon-based tools often miss semantic nuance, while powerful commercial Large Language Models (LLMs) raise privacy concerns for sensitive data. This study evaluates the efficacy of 24 open-weight LLMs (1B-120B parameters) for zero-shot sentiment analysis of spoken language, running entirely on local hardware. In this paradigm, models classify text relying solely on natural language instructions rather than labeled training examples. We compared model performance against three increasingly difficult baselines (naive, standard, and human) across transcripts from two datasets: 193 autobiographical narratives from community participants (N=49) and 292 longitudinal audio journals from psychiatric outpatients (N=64). Results demonstrate that open-weight models significantly outperform standard lexicon-based sentiment tools and frequently surpass individual human raters. Crucially, certain mid-sized models rival the performance of much larger systems, making state-of-the-art analysis accessible via consumer hardware. Additionally, we validate a fully automated privacy-preserving pipeline, finding that transcription errors from automatic speech recognition did not significantly degrade downstream sentiment accuracy. Despite these strengths, a multidimensional fairness audit revealed several demographic disparities: our best-performing models exhibited miscalibration in specific subgroups, lower sensitivity for female speakers in the community dataset, and lower predictive precision for Black speakers in the clinical dataset. Taken together, these findings demonstrate that recent open-weight LLMs can match or surpass human-level performance in sentiment analysis of naturalistic speech while running efficiently and securely on workstation hardware. These advances open opportunities for studying emotional dynamics in daily life and developing privacy-preserving clinical tools.

Article activity feed