Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

End-to-end deep reinforcement learning (DRL) policies offer flexible navigation capabilities but often suffer from poor generalization and unsafe behaviors in unseen or complex environments. In contrast, classical local planners like the Dynamic Window Approach (DWA) provide strong short-term safety guarantees yet frequently fail in cluttered static scenes due to limited horizon reasoning. To bridge this gap, we propose Trust-SAC, a novel trust-aware reinforcement learning framework that enables an agent to dynamically assess the reliability of its own actions by comparing them against a DWA expert—without executing the expert’s commands. The policy learns to output both control actions $(v, \omega)$ and a scalar trust weight $\tau$, which modulates a trust-based reward derived from the critic’s evaluation of the policy versus the expert. This mechanism allows the agent to adaptively balance exploration, efficiency, and safety based on real-time environmental risk. Evaluated across four diverse Gazebo environments with increasing complexity—including one where DWA completely fails—Trust-SAC demonstrates significantly higher task success rates than SAC, PPO, and DWA, while maintaining competitive path efficiency. Our results highlight that embedding a learnable self-assessment mechanism grounded in expert comparison can enhance the robustness and generalization of end-to-end navigation policies without compromising their autonomy.

Article activity feed