AI-Enhanced Rescue Drone with Multi-Modal Vision and Cognitive Agentic Architecture
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In post-disaster search and rescue (SAR) operations, unmanned aerial vehicles (UAVs) are essential tools, yet the large volume of raw visual data often overwhelms human operators by providing isolated, context-free information. This paper presents an innovative system with a novel cognitive–agentic architecture that transforms the UAV from an intelligent tool into a proactive reasoning partner. The core innovation lies in the LLM’s ability to perform high-level semantic reasoning, logical validation, and robust self-correction through internal feedback loops. A visual perception module based on a custom-trained YOLO11 model feeds the cognitive core, which performs contextual analysis and hazard assessment, enabling a complete perception–reasoning–action cycle. The system also incorporates a physical payload delivery module for first-aid supplies, which acts on prioritized, actionable recommendations to reduce operator cognitive load and accelerate victim assistance. This work, therefore, presents the first developed LLM-driven architecture of its kind, transforming a drone from a mere data-gathering tool into a proactive reasoning partner and demonstrating a viable path toward reducing operator cognitive load in critical missions.