Joint Entity and Relation Extraction Algorithm for Historical Literature Retrieval

Hang Zhao
Runjia Ma

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Historical literature contains abundant structured knowledge. However, its strong unstructured nature, complex semantic associations, and high entity ambiguity limit the effectiveness of traditional pipeline methods due to error accumulation, and single-task models due to semantic fragmentation. To address this issue, this paper puts forward a joint entity and relation extraction algorithm tailored for historical literature. On this basis, the model integrates Gaussian-based local enhancement and a multi-channel knowledge matching module. Experimental results show that the proposed algorithm achieves a highest F1 score of 0.96, demonstrating excellent overall performance. The model reaches an F1 score of 89.7% on the named entity recognition task, significantly outperforming the baseline score of 78.2%. For the relation classification task, the F1 score reaches 83.2%, which is 11.2% and 8.7% higher than those of the pipeline method and mainstream joint extraction models, respectively. Moreover, the training time is as low as 450.5 s, and the inference time drops to 20.8 ms, achieving a good balance between accuracy and efficiency. Ablation experiments confirm the effectiveness of the cross-task attention and Gaussian enhancement modules. This work offers an efficient approach for structured knowledge extraction from historical literature and holds great potential for advancing digital humanities research and large-scale historical knowledge base construction.

Version published to 10.21203/rs.3.rs-7364552/v1 on Research Square
Oct 20, 2025

Unsupervised Keyword Extraction Models Using Word-Set Averaging

This article has 1 author:
1. Pejman Khadivi
This article has no evaluationsLatest version Dec 15, 2025
DiLLaB: Discussion Labeling with LLMs for Building Datasets

This article has 6 authors:
1. Ludimila Gonçalves
2. Márcia Lima
3. André Carvalho
4. Walter Nakamura
5. Igor Steinmacher
6. Tayana Conte
This article has no evaluationsLatest version Jan 28, 2026
Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)

This article has 2 authors:
1. Lingerew Bantie
2. Yaregal Assabie
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unsupervised Keyword Extraction Models Using Word-Set Averaging

DiLLaB: Discussion Labeling with LLMs for Building Datasets

Coreference Resolution for Amharic Text using Bidirectional Encoder Representation from Transformer (BERT)