Characterization of a Fixed Reinforcement Learning Policy for Aerial Robot with Suspended Payload under Variable Flight Conditions

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Flights with suspended payloads are particularly challenging because of their coupled dynamics, which lead to instability and increased sensitivity to disturbances. Although reinforcement learning (RL) has successfully achieved controller performance, the generalization and robustness of a single policy remain significant areas of investigation. In this study, we characterized the performance and robustness of a single RL policy for an aerial robot with different trajectory profiles, including a smooth, feasible lemniscate curve and a sharp-turning, infeasible pentagram, under varying velocity references (0.5 m/s and 1.0 m/s) and crosswind disturbances (1.0 m/s). We trained a single RL policy using Proximal Policy Optimization (PPO) with collective thrust and body-rate (CTBR) control using a high-fidelity physics simulator based on the SimpleFlight framework. Real-world experimental results on the Crazyflie 2.1 platform show that the single RL policy successfully generalizes to different trajectory profiles and velocity references and maintains stability under a crosswind disturbance of up to 1.0 m/s which is a substantial challenge for this small class platform and even smaller payload underneath, where such aerodynamic forces are significant compared to the available control authority and system mass. Furthermore, the single RL policy was systematically evaluated using the Mean Euclidean distance (MED) error, cable length transitions, and swing angle distributions. Although the single RL policy maintained a robust control performance, the experimental results indicated performance degradation at higher velocities owing to increased dynamic challenges such as nonlinear aerodynamic drag and actuator saturation. This study provides a detailed performance characterization that highlights the generalization capability of a single-payload-aware RL policy in real-world applications and the limitations arising from the hybrid dynamics of the system.

Article activity feed