Conservative Risk-Sensitive Reinforcement Learning for Reliable Decision-Making Under Uncertainty

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper addresses complex decision-making scenarios characterized by high uncertainty and high-cost errors, researching a risk-sensitive decision-oriented reinforcement learning mining method. It focuses on resolving the reliability issues arising from tail instability in the reward distribution and out-of-distribution actions under offline data conditions. Methodologically, the decision-making process is modeled using a Markov framework, with the reward distribution as the learning object to retain value information under adverse conditions. Based on this, a conditional risk-value metric is introduced to explicitly characterize and suppress tail risk, ensuring that policy optimization no longer relies solely on expected returns. To mitigate estimation bias and over-extrapolation in offline learning, conservative constraints based on behavioral distribution are further incorporated. By limiting the deviation between the policy and the implicit behavioral distribution in the data, out-of-distribution action expansion is suppressed, and the controllability of policy updates is improved. The overall framework unifies risk measurement and conservative learning into a single optimization form, forming a policy learning mechanism that balances returns and safety. Comparative experimental results show that this method exhibits superior overall performance in terms of average returns, tail reward robustness, and safety-related indicators, validating the effectiveness of the co-modeling of risk-sensitive objectives and conservative constraints, and providing an auditable and adjustable risk control approach for highly reliable intelligent decision-making systems.

Article activity feed