Empowering Aerial Maneuver Games Through Model-Based Constrained Reinforcement Learning

Tianyu Lu
Bing Chen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Achieving full autonomy in Within-Visual-Range air combat with a single, end-to-end learning policy is a formidable challenge, where agents must navigate stochastic dynamics and sparse rewards to master the delicate trade-off between aggression and survival. We introduce a Model-Based Reinforcement Learning agent that combines the Dreamer framework with safety-aware objectives to tackle this. To enhance learning stability and foresight in this demanding domain, we augment Dreamer's WM with an Information Noise-Contrastive Estimation loss for long-range dependencies, categorical predictors to robustly model outcomes, Dyna-style actor-critic updates to ground the policy, and a Lipschitz regularizer to constrain value error. Furthermore, our framework integrates a population-based self-play pipeline with curriculum initialization, enabling rapid strategic discovery without expert priors. To validate our approach, we conducted evaluations in a high-fidelity 6-Degree-of-Freedom simulation, where our agent demonstrated superior zero-shot performance, significantly higher sample efficiency than model-free baselines, and rapid fine-tuning against novel opponents, highlighting a viable path toward deployable autonomous agents.

Version published to 10.20944/preprints202510.2280.v1
Oct 29, 2025

Navigating the Trade-Offs: A Quantitative Analysis of Reinforcement Learning Reward Functions for Autonomous Maritime Collision Avoidance

This article has 7 authors:
1. Björn Krautwig
2. Dominik Wans
3. Li Li
4. Till Temmen
5. Lucas Koch
6. Markus Eisenbarth
7. Jakob Andert
This article has no evaluationsLatest version Oct 29, 2025
Multi-Attention Meets Pareto Optimization: A Reinforcement Learning Method for Adaptive UAV Formation Control

This article has 4 authors:
1. Li Zheng
2. Junjie Zeng
3. Long Qin
4. Rusheng Ju
This article has no evaluationsLatest version Sep 16, 2025
Intelligent UAV Trajectory and Power Control in 6G Non-Terrestrial Networks Using Deep Reinforcement Learning

This article has 4 authors:
1. Ali Fenjan
2. Abdelhadi Belkhirat
3. Salah El Askary
4. Athraa Saleh Alsayafi
This article has no evaluationsLatest version Oct 3, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Navigating the Trade-Offs: A Quantitative Analysis of Reinforcement Learning Reward Functions for Autonomous Maritime Collision Avoidance

Multi-Attention Meets Pareto Optimization: A Reinforcement Learning Method for Adaptive UAV Formation Control

Intelligent UAV Trajectory and Power Control in 6G Non-Terrestrial Networks Using Deep Reinforcement Learning