Contextual Grounding and Iterative Refinement: A Hybrid Framework for Reliable Organization-Task Extraction in Vietnamese Administrative Documents
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Extracting accurate Organization-Task relationships, namely which organizations are directly responsible for carrying out assigned tasks from complicated administrative documents, is critical to workflow automation. However, this remains challenging for Large Language Models (LLMs), due to the tendency for LLMs to easily hallucinate and lose their grounding. This work proposes a novel hybrid information extraction pipeline with iterative refinement, designed to automate this extraction task in Vietnamese administrative documents. We introduce a new benchmark dataset of 1,409 administrative directives. Our architecture systematically addresses the LLM unreliability through bidirectional span grounding - multi-stage validation ensuring every relation extracted is verifiable to be anchored in source text - transforming LLMs from autonomous extractors to guided processors under explicit grounding constraints. The experimental results demonstrate that our framework yields an F1-score of 93.8%, remarkably outperforming the standard zero-shot LLM baselines, which struggle at \((\sim)\)40%. In addition, it has emerged that architectural validation and explicit grounding are imperatives for reliable automated extraction of direct responsibilities, allowing for a robust framework through which to automate core 'who does what' identification tasks in high-stakes document domains.