Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
End-to-end deep reinforcement learning (DRL) policies offer flexible navigation capabilities but often suffer from poor generalization and unsafe behaviors in unseen or complex environments. In contrast, classical local planners like the Dynamic Window Approach (DWA) provide strong short-term safety guarantees yet frequently fail in cluttered static scenes due to limited horizon reasoning. To bridge this gap, we propose Trust-SAC, a novel trust-aware reinforcement learning framework that enables an agent to dynamically assess the reliability of its own actions by comparing them against a DWA expert—without executing the expert’s commands. The policy learns to output both control actions $(v, \omega)$ and a scalar trust weight $\tau$, which modulates a trust-based reward derived from the critic’s evaluation of the policy versus the expert. This mechanism allows the agent to adaptively balance exploration, efficiency, and safety based on real-time environmental risk. Evaluated across four diverse Gazebo environments with increasing complexity—including one where DWA completely fails—Trust-SAC demonstrates significantly higher task success rates than SAC, PPO, and DWA, while maintaining competitive path efficiency. Our results highlight that embedding a learnable self-assessment mechanism grounded in expert comparison can enhance the robustness and generalization of end-to-end navigation policies without compromising their autonomy.