Aspect-level Multimodal Sentiment Analysis Model Based on Multi-scale Feature Extraction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In existing multimodal sentiment analysis methods, only the last layer output of BERT is typically used for feature extraction, neglecting abundant information from intermediate layers. This paper proposes an Aspect-level Multimodal Sentiment Analysis Model with Multi-scale Feature Extraction (AMSAM-MFE). The model conducts sentiment analysis on both text and images. For text feature extraction, it incorporates a Multi-scale Layer module based on BERT and utilizes aspect terms to supervise text feature extraction, enhancing text processing performance. For image feature extraction, the model employs a pre-trained Resnest269 model with a specially designed Supervision Layer to improve effectiveness. For feature fusion, the Tensor Fusion Network method is adopted to achieve comprehensive interaction between visual and textual features. Experimental comparisons with other multimodal sentiment analysis models on Twitter2015 and Twitter2017 datasets demonstrate that the proposed multi-scale feature extraction model achieves improved accuracy and F1 scores in aspect-level multimodal sentiment analysis tasks, showing superior classification effectiveness compared to traditional multimodal sentiment analysis models.