Web Agent Agentic Reinforcement Learning Decision Model Under Multi-Cost and Failure Risk Constraints

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Intelligent agent interactions in real-world web environments are commonly constrained by request budgets,time delays, anti-crawling restrictions, and operational failure risks. Strategies solely optimizing task successrates often exhibit unusable phenomena such as "high success but high cost" or "low risk but conservativefailure."This paper proposes a constrained Agentic reinforcement learning model for Web Agents, unifyingpage access, search requests, and external API calls into a unified long-term decision-making framework withassociated costs. It simultaneously incorporates cost budget constraints and tail risk control into theoptimization objective: constructing a multidimensional cost vector comprising cumulative request count,total latency, and failure penalties to achieve budget compliance via Lagrange dual updates;while employing aCVaR risk term to suppress excessive exploration of high-failure-probability paths, thereby achievingadaptive balance among "completion rate, cost, and risk."Experiments were conducted across 30–70site/page templates and 800–1,500 end-to-end web tasks (including information extraction, pricecomparison, form submission, and cross-page navigation). Interaction sequences spanned 20–120 steps withtool scales of 30–200. Performance was benchmarked against unconstrained RL, budget-constrained RL, andrule-based/scripted web agents, quantifying task completion rates, cost-per-success, failure rates, and policystability.scripted web agents. We quantified task completion rates, cost-per-success, failure rates, and policystability.Results demonstrate that at equivalent completion rates, our method reduces C-PS by 22%–31% andlowers failure rates by 18%–26% under high failure penalties. Under fixed budgets, task completion ratesincrease by 10%–16%, highlighting the necessity and effectiveness of constraint modeling for practical WebAgent deployment.

Article activity feed