UAV Mission Planning for Post-Disaster Victim Localisation via Federated Reinforcement Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Rapid localisation of trapped victims after urban disasters is essential but remains difficult due to signal intermittency, energy constraints, and the impracticality of training multi-UAV coordination policies solely through real-world flights. This study addresses the need for sample-efficient, privacy-preserving reinforcement learning in such high-risk environments. We adapt a federated multi-agent reinforcement learning framework originally developed for communication-constrained environments and apply it to post-disaster victim localisation. The approach integrates a lightweight LoS/NLoS surrogate channel model and a PSO-based position estimator for unknown devices, together with simple feasibility checks on energy and altitude separation. UAVs learn locally in simulated environments and only exchange model parameters to maintain privacy. The proposed architecture is evaluated on two synthetic post-earthquake urban environments. Results show that model-aided federated agents significantly outperform independent Q-learning and standard QMIX baselines in both search performance and convergence speed. The adapted framework enables effective coordination under realistic energy and TDMA constraints, achieving high victim coverage with reduced training overhead. This study demonstrates that combining environmental knowledge with decentralised learning architectures can substantially improve the efficiency and robustness of UAV coordination strategies in post-disaster scenarios. The results highlight a viable pathway toward scalable, privacy-preserving, and field-deployable reinforcement learning systems for real-world search-and-rescue missions.