Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study presents a reinforcement learning–based approach to optimize replenishment policies in the presence of uncertainty, with the objective of minimizing total costs, including inventory holding, shortage, and ordering costs. The focus is on single-level assembly systems, where both component delivery lead times and finished product demand are subject to randomness. The problem is formulated as a Markov Decision Process (MDP), in which an agent determines optimal order quantities for each component by accounting for stochastic lead times and demand variability. A Deep Q-Network (DQN) algorithm is adapted and employed to learn optimal replenishment policies over a fixed planning horizon. To enhance learning performance, we develop a tailored simulation environment that captures multi-component interactions, random lead times, and variable demand, along with a modular and realistic cost structure. The environment enables dynamic state transitions, lead time sampling, and flexible order reception modeling, providing a high-fidelity training ground for the agent. To further improve convergence and policy quality, we incorporate local search mechanisms and multiple action space discretizations per component. Experimental results show that the proposed method significantly reduces stockouts and overall costs while improving the system’s adaptability to uncertainty. These findings highlight the potential of deep reinforcement learning as a data-driven and dynamic approach to inventory management in complex and uncertain supply chain environments.