Leveraging Large Language Models and Embedding Representations for Enhanced Word Similarity Computation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Current mainstream methods for computing word similarity often struggle to precisely capture the fine-grained semantics of words across different contexts. Particularly, generative semantic representations typically suffer from issues such as part-of-speech bias, semantic ambiguity, redundant exemplars, and informational redundancy, all of which compromise the accuracy of similarity measurements. To address these problems, this paper proposes WSLE, a word similarity computation framework integrating the semantic generation capabilities of large language models (LLMs) with embedding-based vector representations. First, WSLE addresses four common challenges encountered in generating semantic representations using LLMs—part-of-speech bias, redundant exemplars, semantic ambiguity, and informational redundancy. By applying constraints to lexical items, grammatical categories, semantic descriptions, and prompt length, WSLE effectively mitigates these issues, thus enabling LLMs to generate coherent, precise, and contextually rich semantic representations. Second, these generated semantic representations are transformed into high-dimensional vector embeddings via a deep semantic embedding module, facilitating quantitative assessment of semantic similarity between words. Finally, the effectiveness of WSLE is rigorously evaluated through analyses based on Pearson’s correlation coefficient (r) and Spearman’s rank correlation coefficient (ρ). Experimental results on benchmark datasets, including RG65, MC30, YP130, and MED38, demonstrate that the proposed WSLE framework significantly outperforms existing similarity computation methods, exhibiting notable advantages in accuracy and robustness for word similarity measurement tasks.

Article activity feed