Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Transformer-based models have advanced sentiment analysis but remain difficult to interpret, especially in sensitive domains such as public discourse about women in science, technology, engineering, and mathematics (STEM). Building on earlier work on women-in-STEM sentiment, this study introduces an ethically curated corpus of over 140,000 English tweets/X posts and a validated automatic labelling pipeline that combines hand annotated data with state-of-the-art transformer-based sentiment models. We quantitatively compare several transfer-learning approaches and identify the best-performing model for this domain, achieving high overall accuracy while revealing systematic confusions between neutral and mildly positive content.To open this “black box,” we apply {SHAPley Additive exPlanations (SHAP)} and Integrated Gradients (IG) XAI methods to obtain word-level attributions for correctly and incorrectly classified tweets, showing how specific linguistic cues, such as celebratory hashtags, negation, and emotionally charged terms, drive sentiment predictions and common error modes. We further design a bias probing protocol based on minimally different gendered sentence pairs (e.g., “Women in STEM” vs. “Men in STEM”) and show that the model assigns systematically different sentiment scores and attributions to male- and female-marked variants, indicating learned gender bias. All data processing scripts, model configurations, and analysis code are released here to support transparency, reproducibility, and future research on explainable and fair sentiment analysis in socially critical contexts

Article activity feed