Enhancing Sentiment Analysis with Term Sentiment Entropy: Capturing Nuanced Sentiment in Text Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Sentiment analysis benefits from representations that highlight polarity-bearing terms while suppressing sentiment-ambivalent ones. This paper introduces Term Sentiment Entropy (TSE), a supervised, information-theoretic global factor for sparse text representation. TSE quantifies how selectively a term associates with sentiment labels in the training fold, and it is composed with TF-IDF to up-weight terms that are distributionally concentrated within a class and down-weight those that are diffuse across classes. We evaluate the approach on four public datasets spanning product reviews, social media, and long-form movie reviews under a fixed protocol with Naïve Bayes, Random Forest, and a linear Support Vector Classifier. Results reported as Accuracy, Macro-Precision, Macro-Recall, and Macro-F1 show that TF-IDF plus TSE often matches or improves performance on short and noisy texts such as Amazon cell-phone reviews and two Twitter corpora, while achieving near-ceiling parity with strong baselines on IMDb. The method is lightweight, reproducible, and compatible with conventional preprocessing and feature-selection pipelines because it requires only label statistics from the training data and no external lexicons. We also discuss limitations related to label quality and class imbalance, and we outline imbalance-aware and learned variants of TSE as natural extensions.

Article activity feed