Decoupled Multi-Dimensional Reinforcement Learning with Temporal Communication for Vision-Based UAV Control in Partially Observable Environments

Dapeng Ji
Shike Yang
Weidong Liu
Li Jingchen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper proposes a novel reinforcement learning approach for controlling vision-based unmanned aerial vehicles (UAVs) in partially observable and dynamic environments. Traditional reinforcement learning methods typically employ a single, monolithic policy to simultaneously manage multiple control dimensions—throttle, roll rate, pitch rate, and yaw rate—leading to complex feature representations and suboptimal performance under partial observability. To address this limitation, we introduce a decoupled policy framework that decomposes the UAV's action space into separate control dimensions, effectively transforming the original partially observable Markov decision process (POMDP) into a specialized multi-agent setting. Each action dimension is managed by an independent policy sharing a common feature extraction backbone and communicating through a customized Bidirectional Long Short-Term Memory (Bi-LSTM) network. This architecture allows specialized representation learning while preserving necessary coupling between control dimensions. Experiments conducted in the photorealistic Flightmare simulator on three challenging tasks—obstacle avoidance, target tracking, and object search—demonstrate significant performance improvements compared to several state-of-the-art baselines. Our approach notably reduces collision rates, enhances navigation efficiency, and achieves higher task success rates, thereby validating the efficacy of policy decoupling and temporal communication mechanisms in vision-based UAV reinforcement learning.

Version published to 10.21203/rs.3.rs-7322417/v1 on Research Square
Feb 13, 2026

Dynamic Ego-Centric Graph-Based Reinforcement Learning for Autonomous Quadrotor Navigation

This article has 3 authors:
1. Mohsen Rabdoost Motlagh
2. Mohammad Ali Javadzade
3. Hossein Hosseini
This article has no evaluationsLatest version Jan 5, 2026
BeamCraft: Deep Reinforcement Learning-DrivenMulti-Objective Beamforming for ISAC

This article has 2 authors:
1. Duc Nguyen Dao
2. Yang Miao
This article has no evaluationsLatest version Feb 3, 2026
A Comprehensive Survey of Multi-Agent Reinforcement Learning for Autonomous Systems: Algorithms, Applications, and Open Challenges

This article has 2 authors:
1. Adnan Khalid Bhatti
2. Muhammad Shahzad Mughal
This article has no evaluationsLatest version Feb 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Dynamic Ego-Centric Graph-Based Reinforcement Learning for Autonomous Quadrotor Navigation

BeamCraft: Deep Reinforcement Learning-DrivenMulti-Objective Beamforming for ISAC

A Comprehensive Survey of Multi-Agent Reinforcement Learning for Autonomous Systems: Algorithms, Applications, and Open Challenges