Behavioural Transparency in Multi-Agent RL: Strategic Decoupling and the Economic Optimisation of Tactical Shocks

Kenny Ching

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

A primary challenge in deploying multi-agent reinforcement learning (RL) systems is the opacity of their emergent strategies, necessitating frameworks that render complex agentic behaviour interpretable. When RL agents interact with high-dimensional environments, they often execute decisions that appear erratic or "alien" to human observers. Behavioural economic theory suggests this perception stems from tactical myopia — the tendency of bounded biological agents to treat localised shocks (such as a tactical loss or victory) as terminal states, thereby degrading subsequent macroeconomic efficiency. Utilising high-fidelity telemetry from OpenAI Five in the imperfect-information environment of Dota 2, we provide empirical transparency into the processing of these tactical shocks. We demonstrate that average human cohorts exhibit intense tactical myopia: localised failures trigger extended economic contraction, while localised successes induce satisficing and stagnant resource acquisition. Conversely, econometric state-space matching reveals that the RL agent executes perfect strategic decoupling, treating shocks as neutral state transitions to instantly optimise subsequent macroeconomic trajectories. Crucially, when the RL agent is compared exclusively against apex human experts in identical economic states, the statistical divergence between biological and synthetic agents largely vanishes at all economically meaningful horizons. This convergence provides a transparent econometric explanation for opaque AI behaviour: RL networks do not invent incomprehensible strategies; rather, they mathematically purge the tactical myopia inherent in average play, converging on the precise global optimisation phenotype utilised by apex biological experts.

Version published to 10.21203/rs.3.rs-9183247/v1 on Research Square
Mar 24, 2026

The Spiral of Attention: How Disruptive Agents Centralize Multi-Agent AI Deliberation Networks

This article has 1 author:
1. Vinicius Covas
This article has no evaluationsLatest version Mar 25, 2026
Enforcement Capture Without Enforcers: Code Geometry and Endogenous Concentration in an Agent-Based Model of Fundamentalism

This article has 1 author:
1. Kiran Boggavarapu
This article has no evaluationsLatest version Feb 18, 2026
Labor Market Dynamics and Unemployment Formation: An LLM-based Cognitive Agent Approach with Deep Reinforcement Learning

This article has 1 author:
1. Kailang Li
This article has no evaluationsLatest version Mar 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Spiral of Attention: How Disruptive Agents Centralize Multi-Agent AI Deliberation Networks

Enforcement Capture Without Enforcers: Code Geometry and Endogenous Concentration in an Agent-Based Model of Fundamentalism

Labor Market Dynamics and Unemployment Formation: An LLM-based Cognitive Agent Approach with Deep Reinforcement Learning

Enforcement Capture Without Enforcers: Code Geometry and Endogenous Concentration in an Agent-Based Model of Fundamentalism