Zero-Trust for Agents: Capability Grants, Tripwires, Immutable Logs
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Agentic AI systems can plan and act across tools, raising novel safety and governance risks in production. This preprint proposes a Zero-Trust architecture for agents built on three pillars: capability grants (scoped, short-lived permissions that enforce least privilege), tripwires (runtime policy checks and anomaly detectors that gate or halt actions), and immutable logs (append-only evidence to support oversight, forensics, and rollback). We map each control to EU AI Act Article 14 human-oversight obligations and the NIST AI RMF (Govern/Map/Measure/Manage), and provide a control-to-requirement matrix and KPI/SLOs (e.g., p95 override latency, % gated actions, log completeness, incident MTTR). An ASCII reference diagram and a capability-grant matrix make the design deployable; a compact threat model and micro-evaluation (using OWASP LLM01/LLM06 and Salesforce-style prompt-injection patterns) demonstrate how the control plane contains direct and indirect attacks. The result is a practical blueprint that lets organizations adopt AI agents with verifiable guardrails-meeting emerging regulatory expectations while preserving velocity.