AEGIS: Constraint-first rollout selection for reliable long-horizon decision-making under explicit constraints
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Reliability failures in interactive decision-making are rarely average-case problems: systems fail in the tails, violating explicit constraints or entering constraint traps where every remaining option triggers a violation. We introduce AEGIS, a constraint-first rollout selection system that separates candidate proposal from constraint-aware commitment. AEGIS maintains a structured graph state, predicts state evolution with learned dynamics, and scores candidate rollouts using calibrated constraint risk to preserve future feasibility. On TREC CAsT 2022 (78 topics, 205 turns), AEGIS improves NDCG@10 by +10.4% (0.441→0.487, p<0.001) while reducing violations from 22.3% to 6.4% (−71.3%) and cutting tail violations to p99=18.4%. Against LLM agents (GPT-4+Memory, Claude-3+ReAct), AEGIS matches or exceeds utility while reducing p99 violations by 62–69% at 40–60× lower latency (p99 98 ms vs. seconds). Beyond averages, per-topic and turn-level analyses show consistent gains (71/78 topic wins; gap grows from +0.034 at turn 1 to +0.080 at turn 5), consistent with trap avoidance. Cross-benchmark (CAsT 2021; OR-QuAC), cross-domain Kubernetes remediation (2,500 episodes; p99 −68%), calibration (ECE=0.024), robustness and scaling (up to 2,000-node graphs), and a randomized user study (50 participants, 500 sessions) collectively validate practical, deployable reliability.