Learning Utility Models for Dynamic Inventory Control : A Reinforcement Learning Framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Quick-commerce operations face the dual challenge of rapidly fluctuating demand and strict fulfillment requirements in form of lead times, delivery times and storage space. We propose a utility-driven reinforcement learning (RL) framework that learns item-level utilities directly from observed demand and further uses them to dynamically control inventory. Customer Demand is defined through a two-stage process. A Poisson distribution models average number of orders received in an interval. Further,a negative binomial distribution simulates actual order arrival. Item level demand is derived using availability-aware multinomial logit (MNL) modeling. A key innovation is a censored-MNL correction that imputes preferences for items that are unavailable due to stockouts, mitigating bias in utility estimation and improving demand forecasts. We provide a convergence analysis under a trust-region (PPO-style) update and show, through simulation studies, that censored utility learning improves fill rates, reduces stockouts, and stabilizes inventory while preserving profitability compared to uncensored baselines and Poisson-only methods. The framework is modular—utilities can be parametric or neural—and extensible to assortment and SLA constraints. Collectively, our results demonstrate that learning utilities as the first-class driver of inventory decisions offers a robust, data-efficient route to optimizing fast-moving retail systems under uncertainty.

Article activity feed