An Innovative Approach to Topic Clustering for Social Media and Web Data Using AI

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The vast amount of social media and web data offers valuable insights for purposes such as brand reputation management, topic research, competitive analysis, product development, and public opinion surveys. However, analysing these data to identify patterns and extract valuable insights is challenging due to the vast number of posts, which can number in the thousands within a single day. One practical approach is topic clustering, which creates clusters of mentions that refer to a specific topic. Following this process will create several manageable clusters, each containing hundreds or thousands of posts. These clusters offer a more meaningful overview of the discussed topics, eliminating the need to categorise each post manually. Several topic detection algorithms can achieve clustering of posts, such as LDA, NMF, BERTopic, etc. The existing algorithms, however, have several important drawbacks, including language constraints and slow or resource-intensive data processing. Moreover, the labels for the clusters typically consist of a few keywords that may not make sense unless one explores the mentions within the cluster. Recently, with the introduction of AI large language models, such as GPT-4, new techniques can be realised for topic clustering to address the aforementioned issues. Our novel approach (AI Mention Clustering) employs LLMs at its core to produce an algorithm for efficient and accurate topic clustering of web and social data. Our solution was tested on social and web data and compared to the popular existing algorithm of BERTopic, demonstrating superior resource efficiency and absolute accuracy of clustered documents. Furthermore, it produces summaries of the clusters that are easily understood by humans instead of just representative keywords. This approach enhances the productivity of social and web data researchers by providing more meaningful and interpretable results.

Article activity feed