Information-Optimized and Adaptive Document Segmentation for Multilingual Knowledge Graphs

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Developing precise knowledge representations from extensive textual resources and linguistically scarce environments presents substantial difficulties due to the inherent constraints of large language models (LLMs) in processing extended textual sequences. Contemporary methodologies encounter performance degradation with lengthening input fragments, particularly impacting low-resource language contexts where insufficient data impedes precise entity-relationship identification. We present \textbf{IOADS} (Information-Optimized Adaptive Document Segmentation), an innovative segmentation methodology employing dynamic contextual windowing to preserve essential semantic relationships across extended documents. Our approach achieves significant advancements in graph-structured knowledge extraction, demonstrated by 24\% higher entity identification and 39\% superior relationship extraction for English. For Afrikaans, a linguistically under-resourced language, IOADS delivers 49\% and 82\% enhancements in these metrics, respectively. Additionally, IOADS establishes new benchmarks across key question-answering criteria, including informational breadth, response diversity, and cognitive empowerment.

Article activity feed