Suppressing echo cascades in language-model agents with multi-critic plan selection
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Language-model agents act across many turns, where unsupported claims persist when downstream turns endorse and repeat them. We decompose propagation into injection, endorsement and repetition, and propose CascadeGuard, a multi-critic planner separating proposal from selection. Lightweight critics score candidates for utility, norm compliance and epistemic support under a fixed token budget (15,630 tokens/step). Across collaboration, dialogue and retrieval tasks, CascadeGuard improves success by 12–13 percentage points while reducing norm violations by 69–77% and propagation by 78%. Mechanism interventions show endorsement suppression yields 2.19× larger propagation reduction than injection blocking—validated independently by an automatic detector and human annotators using a pragmatic endorsement taxonomy (n=18 annotators, 840 items; effect ratio 2.34, 95% CI: 1.91, 2.84). External validation on held-out benchmarks (MultiWOZ-Conflict, DSTC11-InfoSeek) confirms gains persist without retuning (+19.2 pp AA, −-8.1 pp unsupported). Human validation (n= 624; Krippendorff’s α=0.843) confirms strong metric–severity correlation (ρ=0.88), indicating that automatic propagation metrics reflect genuine epistemic harm.