A Unified Intelligent Information System for Clause Extraction, Risk Identification, and Consistency Analysis in Legal and Policy Documents Using Multi Model LLM Integration and Structured Knowledge Representation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Modern organizations increasingly rely on large scale legal and policy documents for compliance, governance, and regulatory decision making. However, manual analysis is time consuming, error prone, and insufficient for capturing complex semantic relationships and cross clause dependencies. While advances in large language models (LLMs) have improved automated text understanding, existing approaches treat clause extraction, risk identification, and consistency verification as separate tasks, limiting document level reasoning. Additionally, cloud based processing raises privacy concerns, and limited interpretability reduces user trust. This paper proposes a unified intelligent information system that integrates multiple LLMs with structured data storage for context aware legal document analysis. The framework introduces a multi model architecture using shared semantic representations to enable joint clause interpretation, contextual risk assessment, and cross clause consistency analysis via natural language inference. A structured storage layer supports efficient indexing and retrieval of clause level knowledge, while an explainability module provides evidence grounded reasoning to enhance transparency. The system is designed for privacy preserving offline deployment, ensuring secure processing of sensitive data. Experimental results on benchmark datasets show that the approach achieves 92.4% F1-score in clause extraction, 89.7% F1-score in risk classification, and 91.2% precision in consistency detection, outperforming traditional machine learning, transformer based models, and standalone LLM pipelines. These findings demonstrate improved document level understanding, scalability, and interpretability. DOI for code and datasets: https://doi.org/10.5281/zenodo.19239518, GitHub repository link: https://github.com/annuu005/cogdoc.git.

Article activity feed