Learning Utility Models for Dynamic Inventory Control : A Reinforcement Learning Framework
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Quick-commerce operations face the dual challenge of rapidly fluctuating demand and strict fulfillment requirements in form of lead times, delivery times and storage space. We propose a utility-driven reinforcement learning (RL) framework that learns item-level utilities directly from observed demand and further uses them to dynamically control inventory. Customer Demand is defined through a two-stage process. A Poisson distribution models average number of orders received in an interval. Further,a negative binomial distribution simulates actual order arrival. Item level demand is derived using availability-aware multinomial logit (MNL) modeling. A key innovation is a censored-MNL correction that imputes preferences for items that are unavailable due to stockouts, mitigating bias in utility estimation and improving demand forecasts. We provide a convergence analysis under a trust-region (PPO-style) update and show, through simulation studies, that censored utility learning improves fill rates, reduces stockouts, and stabilizes inventory while preserving profitability compared to uncensored baselines and Poisson-only methods. The framework is modular—utilities can be parametric or neural—and extensible to assortment and SLA constraints. Collectively, our results demonstrate that learning utilities as the first-class driver of inventory decisions offers a robust, data-efficient route to optimizing fast-moving retail systems under uncertainty.