AutoClusRE: An automatic clustering-based method for relation extraction and knowledge graph construction

TongYao Wang
Donghui Shi
Jose Aguilar
Jozef Zurada

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The knowledge graph plays a vital role in integrating and representing information. As a core component in the construction of a knowledge graph, relation extraction is responsible for identifying and establishing semantic relationships between entities. However, existing methods rely heavily on annotated data. When dealing with unlabeled text, they fail to automatically perform relation extraction and construct a knowledge graph. This poses a significant challenge for knowledge graph construction. To address the above challenges, this paper proposes a domain-adaptive entity-relation extraction and knowledge graph construction framework, named AutoClusRE (Automatic Clustering-based Relation Extraction), which is based on a large language model and designed for extracting relational triples from unlabeled text. This method requires no additional training and can directly perform adaptive entity–relation extraction and knowledge graph construction on unlabeled text. Specifically, AutoClusRE constructs an entity type table and a relation type table, which are embedded into the prompt templates of the large language model and dynamically updates them during the extraction process to serve as contextual guidance for the large language model. In addition, a semantics-based clustering algorithm is introduced to effectively merge duplicates and control excessive expansion of the type tables. Experiments conducted on the public dataset FOBIE and the custom dataset HSA (Huizhou-style architecture) demonstrated that AutoClusRE achieved competitive extraction performance without relying on any annotated data. On the public FOBIE dataset, AutoClusRE outperformed traditional deep learning models by 14.8% and DeepSeek by 9.5% in terms of F1 score. On the custom dataset HSA, AutoClusRE’s F1 score improved by 17.2% over traditional deep learning models and by 2% over DeepSeek. This study not only introduces a novel relation extraction method for information management in unknown domains but also constructs a knowledge graph in the field of Huizhou-style architecture based on AutoClusRE, contributing to the digital research and application of domain knowledge in this field.

Version published to 10.21203/rs.3.rs-7705090/v1 on Research Square
Nov 4, 2025

Joint Entity and Relation Extraction Algorithm for Historical Literature Retrieval

This article has 2 authors:
1. Hang Zhao
2. Runjia Ma
This article has no evaluationsLatest version Oct 20, 2025
CAIRD: Context-Aware Implicit Relation Discovery in Multi-Event Chains

This article has 2 authors:
1. Salma Ali
2. Noah Fang
This article has no evaluationsLatest version Oct 28, 2025
NeSyMatch: A Neuro-Symbolic Approach for Knowledge Alignment

This article has 2 authors:
1. Abhisek Sharma
2. Sarika Jain
This article has no evaluationsLatest version Nov 20, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Joint Entity and Relation Extraction Algorithm for Historical Literature Retrieval

CAIRD: Context-Aware Implicit Relation Discovery in Multi-Event Chains

NeSyMatch: A Neuro-Symbolic Approach for Knowledge Alignment