On the Design of Reward Function for Reinforcement Learning–Based Path Following Control Using the TD3 Algorithm

Mahdi Khodaparast
Jafar Roshanian

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Reward design is one of the most influential factors in determining the performance of reinforcement learning (RL) algorithms in continuous-control tasks. This paper presents a comparative study of three reward formulations for training a TD3 agent to perform three-dimensional path following using a particle kinematic system modeled as double-integrator dynamics. The first reward is a linear formulation that assigns proportional penalties based on normalized position and velocity tracking errors. The second reward employs an exponential decay based on the same error terms to emphasize high-precision tracking and reduce sensitivity to smaller deviations. The third reward, a Piecewise Quadratic Reward (PQR), combines quadratic penalties on position and velocity errors with a positive discrete progress bonus, forming a hybrid structure that balances guidance and corrective feedback. All three reward functions are evaluated across three representative trajectories—vertical ascent, straight-line cruise, and helical motion—and analyzed with respect to convergence behavior, tracking accuracy, and control smoothness. The results show that the exponential reward produces more stable trajectories with higher tracking accuracy leading to smoother actions in complex paths, while the PQR formulation penalize the agent heavily leading to less stable learning. These findings provide practical insights into how reward shaping influences the behavior of TD3-based path-following agents and offer guidelines for designing effective reward functions in continuous-control RL applications.

Version published to 10.21203/rs.3.rs-8335282/v1 on Research Square
Feb 13, 2026

Characterization of a Fixed Reinforcement Learning Policy for Aerial Robot with Suspended Payload under Variable Flight Conditions

This article has 3 authors:
1. Ali Tahir Karasahin
2. Ziniu Wu
3. Basaran Bahadir Kocer
This article has no evaluationsLatest version Mar 4, 2026
Policy-Guided Model Predictive Path Integral for Safe Manipulator Trajectory Planning

This article has 3 authors:
1. Liang Liang
2. Chengdong Wu
3. Xiaofeng Wang
This article has no evaluationsLatest version Mar 26, 2026
Multi-UAV collaborative path planning base on CycA-MASAC Reinforcement Learning in GPS-denied Environment

This article has 7 authors:
1. Nan Li
2. Jiahui JIn
3. Jialun Xie
4. Anli Zhang
5. Meng Xie
6. Bobo Li
7. Jian Zhang
This article has no evaluationsLatest version Mar 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Characterization of a Fixed Reinforcement Learning Policy for Aerial Robot with Suspended Payload under Variable Flight Conditions

Policy-Guided Model Predictive Path Integral for Safe Manipulator Trajectory Planning

Multi-UAV collaborative path planning base on CycA-MASAC Reinforcement Learning in GPS-denied Environment