Deep Reinforcement Learning for Optimal Replenishment in Stochastic Assembly Systems

Lativa Sid Ahmed Abdellahi
Zeinebou Zoubeir
Yahya Mohamed
Ahmedou Haouba
Sidi Hmetty

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study presents a reinforcement learning–based approach to optimize replenishment policies in the presence of uncertainty, with the objective of minimizing total costs, including inventory holding, shortage, and ordering costs. The focus is on single-level assembly systems, where both component delivery lead times and finished product demand are subject to randomness. The problem is formulated as a Markov decision process (MDP), in which an agent determines optimal order quantities for each component by accounting for stochastic lead times and demand variability. The Deep Q-Network (DQN) algorithm is adapted and employed to learn optimal replenishment policies over a fixed planning horizon. To enhance learning performance, we develop a tailored simulation environment that captures multi-component interactions, random lead times, and variable demand, along with a modular and realistic cost structure. The environment enables dynamic state transitions, lead time sampling, and flexible order reception modeling, providing a high-fidelity training ground for the agent. To further improve convergence and policy quality, we incorporate local search mechanisms and multiple action space discretizations per component. Simulation results show that the proposed method converges to stable ordering policies after approximately 100 episodes. The agent achieves an average service level of 96.93%, and stockout events are reduced by over 100% relative to early training phases. The system maintains component inventories within operationally feasible ranges, and cost components—holding, shortage, and ordering—are consistently minimized across 500 training episodes. These findings highlight the potential of deep reinforcement learning as a data-driven and adaptive approach to inventory management in complex and uncertain supply chains.

Version published to 10.3390/math13142229
Jul 9, 2025
Version published to 10.20944/preprints202505.2062.v1
May 27, 2025

Learning Utility Models for Dynamic Inventory Control : A Reinforcement Learning Framework

This article has 1 author:
1. Milon
This article has no evaluationsLatest version Jan 23, 2026
Bridging Theory and Practice: A Stochastic Learning-Optimization Model for Resilient Automotive Supply Chains

This article has 2 authors:
1. Muhammad Shahnawaz
2. Adeel Safder
This article has no evaluationsLatest version Dec 11, 2025
A Stochastic Process Optimization Framework for Reshoring Supply Chains: Integrating Digital Twins with Mixed-Integer Programming

This article has 2 authors:
1. Manikandan Chandran
2. Vimal Shanmuganathan
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Learning Utility Models for Dynamic Inventory Control : A Reinforcement Learning Framework

Bridging Theory and Practice: A Stochastic Learning-Optimization Model for Resilient Automotive Supply Chains

A Stochastic Process Optimization Framework for Reshoring Supply Chains: Integrating Digital Twins with Mixed-Integer Programming