Behavioural Transparency in Multi-Agent RL: Strategic Decoupling and the Economic Optimisation of Tactical Shocks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
A primary challenge in deploying multi-agent reinforcement learning (RL) systems is the opacity of their emergent strategies, necessitating frameworks that render complex agentic behaviour interpretable. When RL agents interact with high-dimensional environments, they often execute decisions that appear erratic or "alien" to human observers. Behavioural economic theory suggests this perception stems from tactical myopia — the tendency of bounded biological agents to treat localised shocks (such as a tactical loss or victory) as terminal states, thereby degrading subsequent macroeconomic efficiency. Utilising high-fidelity telemetry from OpenAI Five in the imperfect-information environment of Dota 2, we provide empirical transparency into the processing of these tactical shocks. We demonstrate that average human cohorts exhibit intense tactical myopia: localised failures trigger extended economic contraction, while localised successes induce satisficing and stagnant resource acquisition. Conversely, econometric state-space matching reveals that the RL agent executes perfect strategic decoupling, treating shocks as neutral state transitions to instantly optimise subsequent macroeconomic trajectories. Crucially, when the RL agent is compared exclusively against apex human experts in identical economic states, the statistical divergence between biological and synthetic agents largely vanishes at all economically meaningful horizons. This convergence provides a transparent econometric explanation for opaque AI behaviour: RL networks do not invent incomprehensible strategies; rather, they mathematically purge the tactical myopia inherent in average play, converging on the precise global optimisation phenotype utilised by apex biological experts.