Sentiment Analysis of Naturalistic Speech Using Open-Weight Large Language Models

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Artificial intelligence is increasingly used to analyze emotion in text, but its performance on naturalistic speech remains poorly understood. We evaluated 20 large language models (LLMs), ranging from 1 to 120 billion parameters, on their ability to estimate fine-grained sentiment in transcripts of naturalistic speech. All models were open-weight, meaning their parameters are publicly available and can be run locally without reliance on cloud services. Two datasets were used: the Stanford Emotional Narratives Dataset, autobiographical stories from community participants, and the Bipolar Longitudinal Study, daily audio journals from outpatients with severe mental illness. Human sentiment ratings served as the reference, and models were benchmarked against both individual human agreement and lexicon-based tools (LIWC, VADER). Results showed a three-tier accuracy pattern: small models performed at or below baseline tools, mid-sized models (7-30 billion parameters) matched human agreement, and larger or optimized models exceeded it. Accuracy improved with scaling up to ~12 billion, after which gains plateaued, highlighting diminishing returns beyond mid-sized architectures. Performance was robust to transcription source, indicating AI transcription error did not impair accuracy. Fairness analyses showed overall accuracy was equitable across demographic groups, though modest disparities appeared in prediction consistency by participant race and education. These findings demonstrate that recent open-weight LLMs can match or surpass human-level performance in sentiment analysis of naturalistic speech while running efficiently on affordable workstation hardware. These advances open opportunities for studying emotional dynamics in daily life and developing privacy-preserving tools for clinical research.

Article activity feed