Design and Evaluation of a Secure Air-Gapped LLM-Based Intelligence Analysis Framework
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing reliance on heterogeneous and high-volume data sources in security-sensitive analytical en- vironments has amplified the cognitive burden placed on human analysts. While large language models (LLMs) demonstrate strong capabilities in semantic reasoning and multi-document synthesis, their deployment in regu- lated and air-gapped settings is constrained by data sovereignty requirements, auditability concerns, and the risk of hallucinated outputs. This paper presents the design and empirical evaluation of a secure, air-gapped intelligence analysis frame- work integrating a high-capacity (400B-parameter class) on-premise LLM with a retrieval-augmented generation (RAG) pipeline and enforced human-in-the-loop oversight. The proposed architecture operates entirely offline and restricts model responses to access-controlled retrieved context, supported by immutable audit logging and structured validation workflows. Experimental evaluation across multi-document analytical tasks demonstrates that retrieval-augmented ground- ing reduces hallucination rate from 38.7% to 12.2% while maintaining interactive response latency. An ablation study further analyses the impact of retrieval depth on grounding accuracy and latency trade-offs. Results indi- cate that architectural grounding mechanisms contribute more substantially to reliability than model scale alone, supporting the feasibility of secure, high-capacity LLM deployment in regulated environments. The findings highlight the importance of system-level constraints, contextual grounding, and enforced human oversight in achieving trustworthy LLM-assisted decision support within fully air-gapped operational contexts.