Tuning Knowledge Graph Embeddings in Clustering with LISE

Verdiana Schena
Simona Colucci
Donini Francesco Maria
Floriano Scioscia
Eugenio Di Sciascio

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Knowledge Graph Embeddings are increasingly used in biomedical informatics to support similarity assessment, clustering, and knowledge discovery. Despite strong performance in link prediction, recent studies show that numerical proximity in embedding spaces does not always reflect meaningful semantic similarity. LISE, a logic-based interactive similarity explainer, was introduced to expose shared semantic properties among clustered RDF resources and incorporate user feedback when evaluating cluster coherence. This work extends LISE by integrating Large Language Models for natural-language explanation and investigating whether user-derived relevance signals can actively influence embedding generation, improving the semantic adequacy of similarity-based clustering. Results: We replaced LISE’s template-based verbalization component with a Gemini 2.5 Flash module capable of generating human-readable, path-level explanations of logical Common Subsumers. This resolves previous LISE limitations related to granularity and anaphora resolution, enabling reliable sentence-level user evaluations. To assess whether user preferences can guide the embedding process, we simulated feedback on 1,280 DrugBank-derived triples and evaluated two custom pyRDF2Vec sampling strategies: Predicate Relevance Weight and Predicate-Object Relevance Weight. Relevance weights were learned via a random forest regressor trained on simulated user scores. Predicate-level weighting increased the presence of user-preferred predicates in the most cohesive clusters, with all predicates showing positive or neutral deviation under learned weights. By contrast, predicate-object weighting exhibited limited sensitivity, with most pairs showing unchanged frequency regardless of weight assignment. Average deviation metrics confirm that predicate-level adjustments redirect clustering more effectively toward semantically meaningful biomedical information. Conclusions: User-informed predicate weighting can successfully influence embedding-based clustering, improving alignment with semantically relevant biomedical properties. Predicate-object adjustments provide minimal benefit. Part of this research has been published in the proceedings of the 8th Workshop on Semantic Web Solutions for Large-scale Biomedical Data Analytics (SeWeBMeDA 2025).

Version published to 10.21203/rs.3.rs-8250999/v1 on Research Square
Dec 15, 2025

TG-CENET: An improved reasoning model for temporal knowledge graphs based on contrastive history

This article has 3 authors:
1. Lizhi Miao
2. Kaiwen Wu
3. Yi Huang
This article has no evaluationsLatest version Jan 19, 2026
DiLLaB: Discussion Labeling with LLMs for Building Datasets

This article has 6 authors:
1. Ludimila Gonçalves
2. Márcia Lima
3. André Carvalho
4. Walter Nakamura
5. Igor Steinmacher
6. Tayana Conte
This article has no evaluationsLatest version Jan 28, 2026
Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

TG-CENET: An improved reasoning model for temporal knowledge graphs based on contrastive history

DiLLaB: Discussion Labeling with LLMs for Building Datasets

Emergence of Biological Structural Discovery in General-Purpose Language Models