意味的類似度を用いたテキストからの未知知見検出法 --ディープラーニングによるアプローチ--
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study explores an efficient method to extract novel insights from free-text data using Semantic Textual Similarity (STS). With the growing use of social media platforms like X (formerly Twitter), large-scale free-text data has become a valuable resource for real-time analysis in various fields. Traditional manual classification methods struggle with the vast and diverse datasets, necessitating computational approaches like Natural Language Processing (NLP).The proposed method employs STS to prioritize text analysis by measuring semantic similarity between existing insights and new data, streamlining the discovery process for previously unobserved insights. Evaluations using two datasets—one on public opinions about a film company and another on microaggressions experienced by Chinese students in Japan—demonstrated the model's effectiveness. Results showed that the STS-based approach significantly outperforms random sampling in detecting novel insights efficiently, even in multilingual and small datasets.