SAFi: A Self-Alignment Framework for Verifiable Runtime Governance of Large Language Models

Nelson Amaya

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The deployment of powerful Large Language Models (LLMs) in high-stakes domains presents a critical challenge: ensuring reliable adherence to behavioral constraints at runtime. Existing alignment techniques, primarily focused on pre-deployment training, often fail to prevent model drift or rule violations in live, interactive environments. This paper introduces SAFi (Self-Alignment Framework Interface), a novel, closed-loop framework for the runtime governance of LLMs. SAFi is structured around four distinct faculties, Intellect, Will, Conscience , and Spirit , that separate content generation from rule validation, enabling a continuous cycle of generation, verification, auditing, and adaptation. The framework's key innovation is a stateful, adaptive memory, managed by the mathematical Spirit faculty, which allows the system to be aware of its own performance and correct for behavioral drift over time. We present the results of two empirical benchmark studies comparing a SAFi-governed LLM against a standalone baseline in the high-stakes domains of finance and healthcare. The results demonstrate that SAFi achieves almost 100% adherence to its configured safety rules, whereas the baseline model exhibits catastrophic failures. We conclude that runtime governance frameworks like SAFi are an essential component for building demonstrably safe and reliable AI agents.

Version published to 10.21203/rs.3.rs-7675043/v1 on Research Square
Sep 23, 2025

A Pattern-Oriented Ontology and Workflow Modeling Approach for the Sui Move Programming Language

This article has 2 authors:
1. Antonios Giatzis
2. Christos K. Georgiadis
This article has no evaluationsLatest version Oct 28, 2025
Self-Regulating, Knowledge-Driven Distributed Software Systems: Mindful Machines

This article has 2 authors:
1. Rao Mikkilineni
2. W. Patrick Kelly
This article has no evaluationsLatest version Oct 22, 2025
Self-Regulating, Knowledge-Driven Distributed Software Systems: Mindful Machines

This article has 2 authors:
1. Rao Mikkilineni
2. W. Patrick Kelly
This article has no evaluationsLatest version Oct 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Pattern-Oriented Ontology and Workflow Modeling Approach for the Sui Move Programming Language

Self-Regulating, Knowledge-Driven Distributed Software Systems: Mindful Machines

Self-Regulating, Knowledge-Driven Distributed Software Systems: Mindful Machines