Evaluating the Quality of Data: Case of Sarcasm Dataset

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The models of artificial intelligence (AI) rely on data as their primary fuel. Accurate and efficient AI models that generated by high-quality data may guarantee AI-safe use. Sentiment analysis (SA), one of the tasks in natural language processing (NLP), highly relies on sarcasm detection. Sarcasm’s cryptic character, however, makes it difficult and degrades its quality. Even though the problem was thoroughly investigated, it has been limited by the restrictions resulting from improper labeling and data not specifically gathered to identify sarcasm. This paper evaluates the quality of the sarcasm data by the performance of similarly parameterized models. To make an analysis, we compiled four distinct datasets—SARC, SemEval2022, NewsHeadline, and Multimodal. Undersampling and over-sampling techniques were used to balance the data size as well as class-label variations among the corpora. We perform extensive and fair evaluations on various models ranging from machine learning to transfer learning algorithms and employ TF-IDF vectorization and word embedding text representation techniques. Based on the experimental result, the NewsHeadline corpus exhibited greater quality, achieving a notable F1 score of 0.93 in RoBERTa’s model performance. We have created a new (Sarcasm-Quality) dataset combining the best-performing datasets based on the experimental analysis and made it available for public use.

Article activity feed