Population-dependent agent performance in non-transitive games: a multi-agent Rock--Paper--Scissors benchmark
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Non-transitive environments complicate the notion of a single ''best'' strategy: performance depends on the opponent population, and rankings are meaningful only relative to a specified opponent pool and protocol. We present a reproducible multi-agent benchmark for iterated Rock--Paper--Scissors and evaluate 54 agents from 18 archetypes---deep recurrent and transformer sequence models, actor--critic reinforcement learners, Bayesian/Markov predictors, classical classifiers, and rule-based baselines---in 500-round double round-robin tournaments across 10 random seeds. To support auditability, we define a simple regret certificate: a Lipschitz-type inequality that upper-bounds a best-response payoff gap using the $\ell_1$ discrepancy between an agent's predicted action distribution and an empirical estimate of the opponent's action distribution, computable online from logged predictions. Our experiments indicate that (i) recurrent predictors tend to achieve the highest and most stable scores, primarily by exploiting predictable opponents; (ii) rankings shift notably with the opponent pool (Spearman $\rho = 0.65$ between two evaluation rosters), with the top-ranked method changing across configurations; and (iii) the induced meta-game exhibits substantial non-transitivity, including 177 detected three-cycles in the pairwise payoff matrix. Under our 500-round online update budget and short-context design, transformer agents are competitive but do not outperform tuned recurrent baselines, which may reflect an inductive-bias mismatch in short-horizon adversarial play. Our code and analysis pipeline provide an extensible testbed for studying population-dependent evaluation and learning dynamics in canonical non-transitive games.