Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment

Olga Shvetsova
Danila Katalshov
Sang-Kon Lee

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper proposes a technological framework designed to mitigate the inherent risks associated with the deployment of artificial intelligence (AI) in decision-making and task execution within the management processes. The Agreement Validation Interface (AVI) functions as a modular Application Programming Interface (API) Gateway positioned between user applications and LLMs. This gateway architecture is designed to be LLM-agnostic, meaning it can operate with various underlying LLMs without requiring specific modifications for each model. This universality is achieved by standardizing the interface for requests and responses and applying a consistent set of validation and enhancement processes irrespective of the chosen LLM provider, thus offering a consistent governance layer across a diverse LLM ecosystem. AVI facilitates the orchestration of multiple AI subcomponents for input–output validation, response evaluation, and contextual reasoning, thereby enabling real-time, bidirectional filtering of user interactions. A proof-of-concept (PoC) implementation of AVI was developed and rigorously evaluated using industry-standard benchmarks. The system was tested for its effectiveness in mitigating adversarial prompts, reducing toxic outputs, detecting personally identifiable information (PII), and enhancing factual consistency. The results demonstrated that AVI reduced successful fast injection attacks by 82%, decreased toxic content generation by 75%, and achieved high PII detection performance (F1-score ≈ 0.95). Furthermore, the contextual reasoning module significantly improved the neutrality and factual validity of model outputs. Although the integration of AVI introduced a moderate increase in latency, the overall framework effectively enhanced the reliability, safety, and interpretability of LLM-driven applications. AVI provides a scalable and adaptable architectural template for the responsible deployment of generative AI in high-stakes domains such as finance, healthcare, and education, promoting safer and more ethical use of AI technologies.

Version published to 10.3390/app15137298
Jun 28, 2025
Version published to 10.20944/preprints202505.1116.v1
May 15, 2025

AI-Powered Automated Bug Bounty Platform

This article has 5 authors:
1. Tahir Naquash
2. Zeeshan Yalakpalli
3. Shania Margaret Saini
4. Shivshankar -
5. Ayesha Siddiqua
This article has no evaluationsLatest version Jun 17, 2025
SmartFITLab: Intelligent Execution and Validation Platform for 5G Field Interoperability Testing

This article has 1 author:
1. Tongwei Tu
This article has no evaluationsLatest version Jun 12, 2025
Evaluating the Impact of Reinforcement Learning on Autonomous CI/CD Workflow Optimization

This article has 2 authors:
1. Owen Graham
2. Kelvin Kloss
This article has no evaluationsLatest version Jun 17, 2025

Listed in

Abstract

Article activity feed

Related articles

AI-Powered Automated Bug Bounty Platform

SmartFITLab: Intelligent Execution and Validation Platform for 5G Field Interoperability Testing

Evaluating the Impact of Reinforcement Learning on Autonomous CI/CD Workflow Optimization