A Unified Intelligent Information System for Clause Extraction, Risk Identification, and Consistency Analysis in Legal and Policy Documents Using Multi Model LLM Integration and Structured Knowledge Representation

Veerababu Reddy
Pravallika Bhosale
Sreeja Alle
Anees Abdul
Purna Chandu Repana
Sri Vardhini Nidamanuri

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Modern organizations increasingly rely on large scale legal and policy documents for compliance, governance, and regulatory decision making. However, manual analysis is time consuming, error prone, and insufficient for capturing complex semantic relationships and cross clause dependencies. While advances in large language models (LLMs) have improved automated text understanding, existing approaches treat clause extraction, risk identification, and consistency verification as separate tasks, limiting document level reasoning. Additionally, cloud based processing raises privacy concerns, and limited interpretability reduces user trust. This paper proposes a unified intelligent information system that integrates multiple LLMs with structured data storage for context aware legal document analysis. The framework introduces a multi model architecture using shared semantic representations to enable joint clause interpretation, contextual risk assessment, and cross clause consistency analysis via natural language inference. A structured storage layer supports efficient indexing and retrieval of clause level knowledge, while an explainability module provides evidence grounded reasoning to enhance transparency. The system is designed for privacy preserving offline deployment, ensuring secure processing of sensitive data. Experimental results on benchmark datasets show that the approach achieves 92.4% F1-score in clause extraction, 89.7% F1-score in risk classification, and 91.2% precision in consistency detection, outperforming traditional machine learning, transformer based models, and standalone LLM pipelines. These findings demonstrate improved document level understanding, scalability, and interpretability. DOI for code and datasets: https://doi.org/10.5281/zenodo.19239518, GitHub repository link: https://github.com/annuu005/cogdoc.git.

Version published to 10.21203/rs.3.rs-9278472/v1 on Research Square
Apr 2, 2026

An LLM-Driven Ensemble Framework for Constructing Legal Knowledge Graphs from Legislative Corpora

This article has 3 authors:
1. Xinchun Zhang
2. Iman Ardekani
3. Neda Sakhaee
This article has no evaluationsLatest version Apr 7, 2026
Extracting Non-Taxonomic and Ternary Relations from Patient-Generated Texts for Semantic Interoperability

This article has 4 authors:
1. Jael Gudu
2. Joseph Balikuddembe
3. Johnson Mwebaze
4. Daniel Opiyo
This article has no evaluationsLatest version Mar 18, 2026
U-CAI: A Secure Conversational AI Architecture for Natural Language Interaction with Enterprise ERP Systems

This article has 4 authors:
1. Deepak Kumar Mishra
2. Sunil Kumar Dhal
3. Gopikrishna Panda
4. Sarthak Dhal
This article has no evaluationsLatest version Mar 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An LLM-Driven Ensemble Framework for Constructing Legal Knowledge Graphs from Legislative Corpora

Extracting Non-Taxonomic and Ternary Relations from Patient-Generated Texts for Semantic Interoperability

U-CAI: A Secure Conversational AI Architecture for Natural Language Interaction with Enterprise ERP Systems