Exploring the Cognitive Capabilities of Large Language Models in Autonomous and Swarm Navigation Systems

Dawid Ewald
Filip Rogowski
Marek Suśniak
Patryk Bartkowiak
Patryk Blumensztajn

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rapid evolution of autonomous vehicles necessitates increasingly sophisticated cognitive capabilities to handle complex, unstructured environments. This study explores the cognitive potential of Large Language Models (LLMs) in autonomous navigation and swarm control systems, addressing the limitations of traditional rule-based approaches. The research investigates whether multimodal LLMs, specifically a customized version of LLaVA 7B (Large Language and Vision Assistant), can serve as a central decision-making unit for autonomous vehicles equipped with cameras and distance sensors. The developed prototype integrates a Raspberry Pi module for data acquisition and motor control with a main computational unit running the LLM via the Ollama platform. Communication between modules combines REST API for sensory data transfer and TCP sockets for real-time command exchange. Without fine-tuning, the system relies on advanced prompt engineering and context management to ensure consistent reasoning and structured JSON-based control outputs. Experimental results demonstrate that the model can interpret real-time visual and distance data to generate reliable driving commands and descriptive situational reasoning. These findings suggest that LLMs possess emerging cognitive abilities applicable to real-world robotic navigation and lay the groundwork for future swarm systems capable of cooperative exploration and decision-making in dynamic environments. These insights are particularly valuable for researchers in swarm robotics and developers of edge-AI systems seeking efficient, multimodal navigation solutions.

Version published to 10.3390/electronics15010035
Dec 22, 2025
Version published to 10.20944/preprints202511.0459.v1
Nov 7, 2025

A Visual Target Navigation Method for Quadcopter Based on Large Language Model in Unknown Environment

This article has 7 authors:
1. Yunzhuo Liu
2. Zhaowei Ma
3. Jiankun Guo
4. Haozhe Sun
5. Yifeng Niu
6. Hong Zhang
7. Mengyun Wang
This article has no evaluationsLatest version Dec 14, 2025
A Brief Survey of Deep Reinforcement Learning Algorithms for Autonomous Systems

This article has 7 authors:
1. Maxwell Khan
2. Jackson Reynolds
3. Madison Taylor
4. Caleb Walker
5. Savannah Mitchell
6. Ethan Carter
7. Emma Davis
This article has no evaluationsLatest version Jan 22, 2026
Multimodal Vision Language Models in Interactive and Physical Environments

This article has 4 authors:
1. Lucas Pereira
2. Martina Kovács
3. Ahmed El-Masry
4. Feidlimid Shyama
This article has no evaluationsLatest version Dec 26, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Visual Target Navigation Method for Quadcopter Based on Large Language Model in Unknown Environment

A Brief Survey of Deep Reinforcement Learning Algorithms for Autonomous Systems

Multimodal Vision Language Models in Interactive and Physical Environments