Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study formulates collaborative large language model (LLM) agents as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and optimizes group behavior using centralized training with decentralized execution (CTDE). A group-relative policy optimization (GRPO) objective is introduced to jointly optimize solution quality, coordination consistency, and response latency. Experiments are conducted on collaborative writing and collaborative coding benchmarks comprising 6,000 multi-agent episodes with 2–4 agents per task. Compared with single-agent and prompt-only collaboration baselines, the proposed approach achieves a 3.1× reduction in task completion time, a 19.4% improvement in output consistency, and a 21.7% increase in coding test pass rate, demonstrating effective performance optimization under partial observability.

Article activity feed