On the Design of Reward Function for Reinforcement Learning–Based Path Following Control Using the TD3 Algorithm
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Reward design is one of the most influential factors in determining the performance of reinforcement learning (RL) algorithms in continuous-control tasks. This paper presents a comparative study of three reward formulations for training a TD3 agent to perform three-dimensional path following using a particle kinematic system modeled as double-integrator dynamics. The first reward is a linear formulation that assigns proportional penalties based on normalized position and velocity tracking errors. The second reward employs an exponential decay based on the same error terms to emphasize high-precision tracking and reduce sensitivity to smaller deviations. The third reward, a Piecewise Quadratic Reward (PQR), combines quadratic penalties on position and velocity errors with a positive discrete progress bonus, forming a hybrid structure that balances guidance and corrective feedback. All three reward functions are evaluated across three representative trajectories—vertical ascent, straight-line cruise, and helical motion—and analyzed with respect to convergence behavior, tracking accuracy, and control smoothness. The results show that the exponential reward produces more stable trajectories with higher tracking accuracy leading to smoother actions in complex paths, while the PQR formulation penalize the agent heavily leading to less stable learning. These findings provide practical insights into how reward shaping influences the behavior of TD3-based path-following agents and offer guidelines for designing effective reward functions in continuous-control RL applications.