A multimodal deep reinforcement learning framework for multi-period inventory decision-making under demand uncertainty

Yu-Xin Tian
Chuan Zhang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We investigate the problem of multi-period inventory decision-making driven by multisource multimodal data and propose a deep reinforcement learning method--WET-TD3--that integrates multimodal environmental perception with policy optimization to generate end-to-end replenishment quantities for each period. First, based on demand-related structured features and unstructured customer review texts from multiple sources, we design a set of multimodal feature-aware agent neural networks incorporating word embeddings and Transformer modules, thereby constructing a state space adaptable to dynamic market environments. Second, we enhance the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm to support a multimodal Actor-Critic architecture tailored for high-dimensional heterogeneous inputs. Additionally, we introduce delayed policy updates, experience replay, and exploration noise mechanisms to improve training stability. Finally, experiments based on real-world data show that the WET-TD3 method significantly outperforms benchmark approaches in multi-period inventory management, achieving an average cost reduction of over 53.69%. The method dynamically adjusts replenishment strategies in response to changes in the relative magnitude of unit holding and underage costs, maintaining stable performance under varying cost structures. These findings highlight that the deep integration of unstructured textual reviews and structured features from multiple sources is fundamental to achieving high-accuracy replenishment, while the reinforcement learning framework effectively supports long-term optimization goals in uncertain and dynamic demand environments.

Version published to 10.21203/rs.3.rs-6843287/v1 on Research Square
Jul 17, 2025

A Hybrid Learning Approach to Product Usage Prediction Using Attention-Driven DeepFM Networks and Meta-Learned Optimization

This article has 2 authors:
1. Owen Graham
2. Dan Wilson
This article has no evaluationsLatest version Jun 30, 2025
TSPPO: Transformer-Based Sequential Proximal Policy Optimization for Multi-Agent Systems

This article has 6 authors:
1. Tao YANG
2. Xinhao SHI
3. Cheng XU
4. Yulin YANG
5. Qinghan ZENG
6. Hongzhe LIU
This article has no evaluationsLatest version Jul 10, 2025
Unsupervised Temporal Encoding for Stock Price Prediction through Dual-Phase Learning

This article has 1 author:
1. Qingqing Xu
This article has no evaluationsLatest version Jul 17, 2025

Listed in

Abstract

Article activity feed

Related articles

A Hybrid Learning Approach to Product Usage Prediction Using Attention-Driven DeepFM Networks and Meta-Learned Optimization

TSPPO: Transformer-Based Sequential Proximal Policy Optimization for Multi-Agent Systems

Unsupervised Temporal Encoding for Stock Price Prediction through Dual-Phase Learning