Joint Entity and Relation Extraction Algorithm for Historical Literature Retrieval
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Historical literature contains abundant structured knowledge. However, its strong unstructured nature, complex semantic associations, and high entity ambiguity limit the effectiveness of traditional pipeline methods due to error accumulation, and single-task models due to semantic fragmentation. To address this issue, this paper puts forward a joint entity and relation extraction algorithm tailored for historical literature. On this basis, the model integrates Gaussian-based local enhancement and a multi-channel knowledge matching module. Experimental results show that the proposed algorithm achieves a highest F1 score of 0.96, demonstrating excellent overall performance. The model reaches an F1 score of 89.7% on the named entity recognition task, significantly outperforming the baseline score of 78.2%. For the relation classification task, the F1 score reaches 83.2%, which is 11.2% and 8.7% higher than those of the pipeline method and mainstream joint extraction models, respectively. Moreover, the training time is as low as 450.5 s, and the inference time drops to 20.8 ms, achieving a good balance between accuracy and efficiency. Ablation experiments confirm the effectiveness of the cross-task attention and Gaussian enhancement modules. This work offers an efficient approach for structured knowledge extraction from historical literature and holds great potential for advancing digital humanities research and large-scale historical knowledge base construction.