String Vector based AHC Algorithm for Clustering Words Semantically
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This article proposes the modified AHC (Agglomerative HierarchicalClustering) algorithm which clusters string vectors, instead ofnumerical vectors, as the approach to the word clustering. Theresults from applying the string vector based algorithms to the textclustering were successful in previous works and synergy effectbetween the text clustering and the word clustering is expected bycombining them with each other; the two facts become motivations forthis research. In this research, we define the operation on stringvectors called semantic similarity, and modify the AHC algorithm byadopting the proposed similarity metric as the approach to the wordclustering. The proposed AHC algorithm is empirically validated asthe better approach in clustering words in news articles andopinions. We need to define and characterize mathematically moreoperations on string vectors for modifying more advanced machinelearning algorithms.