Progressive Multi-Turn Reinforcement Learning for Dynamic User-Interactive Tool Agents
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent advances in reinforcement learning for large language models have produced powerful agent frameworks that achieve strong performance on multi-turn tool use, interactive search, and complex reasoning. However, existing reinforcement learning frameworks for large language model agents face three critical limitations: difficulty in handling dynamic user interactions owing to reliance on pre-scripted queries, limited scalability across varying interaction horizons with fixed scaling schedules, and substantial reward engineering overhead requiring domain-specific manual tuning. We introduce Progressive Multi-Turn Reinforcement Learning for Dynamic User-Interactive Tool Agents, a novel framework that integrates progressive user-interactive training to overcome sparse reward signals, adaptive horizon management that monitors performance metrics and adjusts training complexity accordingly, and domain-adaptive tool orchestration that learns optimal tool selection patterns across domains. Extensive experiments on WebArena, TAU-Bench, Berkeley Function-Calling Leaderboard Version 3, BabyAI, and SciWorld demonstrate that our method achieves 28.4% success rate on WebArena and 76.3\% on TAU-Bench, substantially outperforming the baselines, such as ReAct (16.2%) and MUA-RL (24.6%), while maintaining 94.7% performance on embodied reasoning tasks and 78.9\% cross-domain performance retention. Our work establishes a unified framework for realistic user interaction training, performance-adaptive complexity scaling, and domain-flexible tool orchestration.