Decoupled Multi-Dimensional Reinforcement Learning with Temporal Communication for Vision-Based UAV Control in Partially Observable Environments

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper proposes a novel reinforcement learning approach for controlling vision-based unmanned aerial vehicles (UAVs) in partially observable and dynamic environments. Traditional reinforcement learning methods typically employ a single, monolithic policy to simultaneously manage multiple control dimensions—throttle, roll rate, pitch rate, and yaw rate—leading to complex feature representations and suboptimal performance under partial observability. To address this limitation, we introduce a decoupled policy framework that decomposes the UAV's action space into separate control dimensions, effectively transforming the original partially observable Markov decision process (POMDP) into a specialized multi-agent setting. Each action dimension is managed by an independent policy sharing a common feature extraction backbone and communicating through a customized Bidirectional Long Short-Term Memory (Bi-LSTM) network. This architecture allows specialized representation learning while preserving necessary coupling between control dimensions. Experiments conducted in the photorealistic Flightmare simulator on three challenging tasks—obstacle avoidance, target tracking, and object search—demonstrate significant performance improvements compared to several state-of-the-art baselines. Our approach notably reduces collision rates, enhances navigation efficiency, and achieves higher task success rates, thereby validating the efficacy of policy decoupling and temporal communication mechanisms in vision-based UAV reinforcement learning.

Article activity feed