Don’t Look Up: Evaluating the Tradeoff between Performance and Sustainability of LLMs for Text Analysis.

Sean Palicki
Isaac Bravo
Clint Claessen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are widely used as research tools, but their high resource demands raise significant environmental concerns. While LLMs offer advantages in certain applications, their high energy demands prompt a necessary question for social scientists: Is it worth considering LLMs for every text analysis task? This study systematically evaluates the trade-off between performance and energy usage across computational text analysis methods, including dictionaries, trained classifiers, and open “local” LLMs. Applying sentiment analysis, multi-class classification, and named entity recognition to political documents, we measure energy consumption, CO2 emissions, correlation with human raters, F1-Score and processing time. We find that LLMs perform well on sentiment analysis, closely matching human judgment, but at relatively high environmental costs. For classification and named entity recognition, task-specific models achieve superior accuracy and low environmental impact. Contrary to multi-purpose LLM benchmarks, larger parameter counts do not guarantee better performance on text classification tasks. Introducing a CO2 Adjusted F1-Score, we observe that smaller and more efficient models, such as Mistral-Nemo (12B), outperform larger quantized models like Deepseek-R1 (32B). Our findings highlight the necessity for thoughtful model selection, rather than defaulting to LLMs. A "right-fit" approach, employing task-specific, lighter methods offers performance and sustainability benefits.

Version published to 10.31235/osf.io/vwb5h_v1 on OSF Preprints
Aug 14, 2025

Comparing Large Language Models for Text Classification: Model Selection Across Tasks, Texts, and Languages

This article has 1 author:
1. Michael Heseltine
This article has no evaluationsLatest version Aug 11, 2025
Issue Detection and Future Proofing Dutch Government Apps Using Language Technologies

This article has 3 authors:
1. Anca-Mihaela Matei
2. Flor Miriam Plaza-del-Arco
3. Natalia Amat-Lefort
This article has no evaluationsLatest version Aug 21, 2025
A Deep Learning Approach for Multilingual Sentiment Analysis

This article has 3 authors:
1. Bablu Pramanik
2. Santanu Modak
3. Chayan Paul
This article has no evaluationsLatest version Sep 11, 2025

Listed in

Abstract

Article activity feed

Related articles

Comparing Large Language Models for Text Classification: Model Selection Across Tasks, Texts, and Languages

Issue Detection and Future Proofing Dutch Government Apps Using Language Technologies

A Deep Learning Approach for Multilingual Sentiment Analysis