Leveraging Large Language Models and Embedding Representations for Enhanced Word Similarity Computation

XiaoHong Peng
Hongbin Jiang
Jing Chen
MingXin Liu
Xiao Chen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Current mainstream methods for computing word similarity often struggle to precisely capture the fine-grained semantics of words across different contexts. Particularly, generative semantic representations typically suffer from issues such as part-of-speech bias, semantic ambiguity, redundant exemplars, and informational redundancy, all of which compromise the accuracy of similarity measurements. To address these problems, this paper proposes WSLE, a word similarity computation framework integrating the semantic generation capabilities of large language models (LLMs) with embedding-based vector representations. First, WSLE addresses four common challenges encountered in generating semantic representations using LLMs—part-of-speech bias, redundant exemplars, semantic ambiguity, and informational redundancy. By applying constraints to lexical items, grammatical categories, semantic descriptions, and prompt length, WSLE effectively mitigates these issues, thus enabling LLMs to generate coherent, precise, and contextually rich semantic representations. Second, these generated semantic representations are transformed into high-dimensional vector embeddings via a deep semantic embedding module, facilitating quantitative assessment of semantic similarity between words. Finally, the effectiveness of WSLE is rigorously evaluated through analyses based on Pearson’s correlation coefficient (r) and Spearman’s rank correlation coefficient (ρ). Experimental results on benchmark datasets, including RG65, MC30, YP130, and MED38, demonstrate that the proposed WSLE framework significantly outperforms existing similarity computation methods, exhibiting notable advantages in accuracy and robustness for word similarity measurement tasks.

Version published to 10.21203/rs.3.rs-6573106/v1 on Research Square
May 16, 2025

Bag-of-Frames: Improving Bag-of-Words for a better similarity measure

This article has 1 author:
1. Abel Browarnik
This article has no evaluationsLatest version Jun 3, 2025
Information-Optimized and Adaptive Document Segmentation for Multilingual Knowledge Graphs

This article has 3 authors:
1. Diqi Si
2. Yuwen Wei
3. Leiwu Wen
This article has no evaluationsLatest version Jun 6, 2025
Verified Language Processing with Hybrid Explainability

This article has 3 authors:
1. Oliver Robert Fox
2. Giacomo Bergami
3. Graham Morgan
This article has no evaluationsLatest version May 16, 2025

Listed in

Abstract

Article activity feed

Related articles

Bag-of-Frames: Improving Bag-of-Words for a better similarity measure

Information-Optimized and Adaptive Document Segmentation for Multilingual Knowledge Graphs

Verified Language Processing with Hybrid Explainability