AutoClusRE: An automatic clustering-based method for relation extraction and knowledge graph construction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The knowledge graph plays a vital role in integrating and representing information. As a core component in the construction of a knowledge graph, relation extraction is responsible for identifying and establishing semantic relationships between entities. However, existing methods rely heavily on annotated data. When dealing with unlabeled text, they fail to automatically perform relation extraction and construct a knowledge graph. This poses a significant challenge for knowledge graph construction. To address the above challenges, this paper proposes a domain-adaptive entity-relation extraction and knowledge graph construction framework, named AutoClusRE (Automatic Clustering-based Relation Extraction), which is based on a large language model and designed for extracting relational triples from unlabeled text. This method requires no additional training and can directly perform adaptive entity–relation extraction and knowledge graph construction on unlabeled text. Specifically, AutoClusRE constructs an entity type table and a relation type table, which are embedded into the prompt templates of the large language model and dynamically updates them during the extraction process to serve as contextual guidance for the large language model. In addition, a semantics-based clustering algorithm is introduced to effectively merge duplicates and control excessive expansion of the type tables. Experiments conducted on the public dataset FOBIE and the custom dataset HSA (Huizhou-style architecture) demonstrated that AutoClusRE achieved competitive extraction performance without relying on any annotated data. On the public FOBIE dataset, AutoClusRE outperformed traditional deep learning models by 14.8% and DeepSeek by 9.5% in terms of F1 score. On the custom dataset HSA, AutoClusRE’s F1 score improved by 17.2% over traditional deep learning models and by 2% over DeepSeek. This study not only introduces a novel relation extraction method for information management in unknown domains but also constructs a knowledge graph in the field of Huizhou-style architecture based on AutoClusRE, contributing to the digital research and application of domain knowledge in this field.

Article activity feed