Exploring Semanticity-Based Clustering of Text Using Transformer Models: Advancing AI Applications in Education and Beyond

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The study explores semantic-based clustering using transformer models to overcome the limitations of traditional text clustering approaches. While conventional methods rely on word frequency, this research leverages BERT and SciBERT's contextual understanding capabilities for more nuanced text organization. The methodology combines transformer-based semantic embeddings with various pooling strategies and clustering algorithms, comparing their performance against TF-IDF baselines. Experiments extended across five diverse domains: news, research papers, e-commerce products, movies, and job postings. It was observed that transformer-based embeddings with CLS pooling consistently outperformed traditional methods, producing more coherent clusters across all domains. SciBERT proved to be particularly useful for scientific text. These findings show possible applications in personalized learning systems, content organization, and recommender systems where semantic interpretation is critical. The research provides a framework to develop text clustering solutions better suited to capture contextual linkages and semantic intricacies in complex document collections.

Article activity feed