A Discourse-Aware Graph–Text Framework for Coherent and Faithful Summarization of Long Legal Documents
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Summarization systems often fail to maintain coherence and ability to track facts when it comes to long documents. This can be particularly seen in legal and governmental reports since there is a need to reason across multiple sections and there are also a lot of underlying factual contexts. This work proposes a hybrid approach which consists of a framework that is aware of all the facts using it to chunk the data and uses hierarchical attention and fact based graph to capture relations. It does this across sentences, chunks, and even from tuples extracted from the facts. We also propose a decoding mechanism that is aware of the citations and uses that to align the generated statement with the evidence that supports it. This significantly improves interpretability and also reduces unsupported content. The framework is evaluated on the GovReport dataset and it has achieved good performance like 52.80 ROUGE-1, 21.50 ROUGE-2, and 24.40 ROUGE-L against other baseline long document summarizers. It does this with improved factual grounding evidenced by a 93.9% fact similarity score. It also averages 11 sentences supported by evidence for each summary. These results confirm that the proposed framework leads to a more faithful summaries which is coherent and also is supported strongly by the facts.