AI-Modulated Pólya Urns (AIM-PU): A Unified Framework for Risk-Sensitive Contextual Bandits, Resource Allocation and Portfolio Design
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We propose the AI-modulated Pólya Urn(AIM-PU), a stochastic framework bridging classical path-dependent reinforcement processes with adaptive, covariate-driven decision rules. While standard Pólya--Eggenberger urns assume fixed replacement kernels, AIM-PU introduces a heterogeneous reinforcement mechanism where the replacement tensor is modulated by a learned policy adapted to a covariate filtration. We establish that this coupling preserves analytical tractability via martingale approximations. Under mild assumptions on policy regularity and graph irreducibility, we prove: (i) controlled linear growth of the total mass; (ii) almost-sure convergence of the normalized composition; and (iii) a functional central limit theorem (FCLT) where fluctuations converge to a Gaussian mixture. The framework is first verified through extensive synthetic simulation studies that confirm the theoretical convergence rates and distributional stability in volatile, non-stationary environments. We then demonstrate the model's practical utility in risk-sensitive portfolio allocation using real-world financial data (SPY, MSFT, DUK, TSLA) from 2020--2025. By embedding Conditional Value at Risk (CVaR) constraints directly into the urn dynamics, AIM-PU structurally enforces long-term risk limits. Our results show that AIM-PU outperforms stateless contextual bandits in both Sharpe ratio and drawdown minimization, providing a robust, interpretable, and mathematically grounded approach to automated financial decision-making.