Performance Analysis of RoBERTa in Detecting Sexism in Online Comments

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Detecting and mitigating sexist language has become a critical issue in digital communication. While human experts can identify nuanced forms of sexism, the growing volume of online content makes manual detection impractical. This study compares four machine learning approaches for automated sexism detection: trigram frequency models, text vectorization techniques, convolutional neural networks (CNN), and RoBERTa, a transformer-based model. Traditional methods like trigram analysis and text vectorization are useful for identifying basic patterns but struggle to capture the contextual and semantic nuances inherent in sexist language. In contrast, more advanced models, such as CNNs and RoBERTa, leverage deeper understanding of language structure and context. Using a publicly available dataset, we evaluate the performance of these models based on accuracy, precision, recall, and F1-score. Our findings reveal that while trigram analysis and text vectorization provide some insights, RoBERTa consistently outperforms the other models by capturing the subtleties of sexist language and providing more accurate and reliable results. This research not only improves the technical methodologies for sexism detection but also contributes to the development of scalable, automated moderation tools that can address harmful linguistic patterns in real-time, promoting safer and more inclusive online environments.

Article activity feed