Conservative Risk-Sensitive Reinforcement Learning for Reliable Decision-Making Under Uncertainty

Yinghao Zhao
Yilin Li
Yingzi Wang
Yunfei Nie
Yixuan Lu
Nuo Chen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper addresses complex decision-making scenarios characterized by high uncertainty and high-cost errors, researching a risk-sensitive decision-oriented reinforcement learning mining method. It focuses on resolving the reliability issues arising from tail instability in the reward distribution and out-of-distribution actions under offline data conditions. Methodologically, the decision-making process is modeled using a Markov framework, with the reward distribution as the learning object to retain value information under adverse conditions. Based on this, a conditional risk-value metric is introduced to explicitly characterize and suppress tail risk, ensuring that policy optimization no longer relies solely on expected returns. To mitigate estimation bias and over-extrapolation in offline learning, conservative constraints based on behavioral distribution are further incorporated. By limiting the deviation between the policy and the implicit behavioral distribution in the data, out-of-distribution action expansion is suppressed, and the controllability of policy updates is improved. The overall framework unifies risk measurement and conservative learning into a single optimization form, forming a policy learning mechanism that balances returns and safety. Comparative experimental results show that this method exhibits superior overall performance in terms of average returns, tail reward robustness, and safety-related indicators, validating the effectiveness of the co-modeling of risk-sensitive objectives and conservative constraints, and providing an auditable and adjustable risk control approach for highly reliable intelligent decision-making systems.

Version published to 10.20944/preprints202604.0300.v1
Apr 7, 2026

Decomposable Reward Modeling and Realistic Environment Design for Reinforcement Learning-Based Forex Trading

This article has 1 author:
1. Nabeel Ahmad Saidd
This article has no evaluationsLatest version Mar 23, 2026
Uncertainty-Aware Marketing Attribution Inference and Budget Decision-Making with Intelligent Agents

This article has 6 authors:
1. Qianxi Liu
2. Ye Zhang
3. Sheng Chen
4. Zhaocheng Liu
5. Yuqiu Xu
6. Hengguang Cui
This article has no evaluationsLatest version Mar 17, 2026
Regret Is Weighted Forgetting

This article has 1 author:
1. Michael Timothy Bennett
This article has no evaluationsLatest version Mar 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Decomposable Reward Modeling and Realistic Environment Design for Reinforcement Learning-Based Forex Trading

Uncertainty-Aware Marketing Attribution Inference and Budget Decision-Making with Intelligent Agents

Regret Is Weighted Forgetting