AI-Modulated Pólya Urns (AIM-PU): A Unified Framework for Risk-Sensitive Contextual Bandits, Resource Allocation and Portfolio Design

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We propose the AI-modulated Pólya Urn(AIM-PU), a stochastic framework bridging classical path-dependent reinforcement processes with adaptive, covariate-driven decision rules. While standard Pólya--Eggenberger urns assume fixed replacement kernels, AIM-PU introduces a heterogeneous reinforcement mechanism where the replacement tensor is modulated by a learned policy adapted to a covariate filtration. We establish that this coupling preserves analytical tractability via martingale approximations. Under mild assumptions on policy regularity and graph irreducibility, we prove: (i) controlled linear growth of the total mass; (ii) almost-sure convergence of the normalized composition; and (iii) a functional central limit theorem (FCLT) where fluctuations converge to a Gaussian mixture. The framework is first verified through extensive synthetic simulation studies that confirm the theoretical convergence rates and distributional stability in volatile, non-stationary environments. We then demonstrate the model's practical utility in risk-sensitive portfolio allocation using real-world financial data (SPY, MSFT, DUK, TSLA) from 2020--2025. By embedding Conditional Value at Risk (CVaR) constraints directly into the urn dynamics, AIM-PU structurally enforces long-term risk limits. Our results show that AIM-PU outperforms stateless contextual bandits in both Sharpe ratio and drawdown minimization, providing a robust, interpretable, and mathematically grounded approach to automated financial decision-making.

Article activity feed