Exploring the Cognitive Capabilities of Large Language Models in Autonomous and Swarm Navigation Systems

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study explores the cognitive potential of Large Language Models (LLMs) in autonomous navigation and swarm control systems. The research investigates whether multimodal LLMs, specifically a customized version of LLaVA 7B (Large Language and Vision Assistant), can serve as a central decision-making unit for autonomous vehicles equipped with cameras and distance sensors. The developed prototype integrates a Raspberry Pi module for data acquisition and motor control with a main computational unit running the LLM via the Ollama platform. Communication between modules combines REST API for sensory data transfer and TCP sockets for real-time command exchange. Without fine-tuning, the system relies on advanced prompt engineering and context management to ensure consistent reasoning and structured JSON-based control outputs. Experimental results demonstrate that the model can interpret real-time visual and distance data to generate reliable driving commands and descriptive situational reasoning. These findings suggest that LLMs possess emerging cognitive abilities applicable to real-world robotic navigation and lay the groundwork for future swarm systems capable of cooperative exploration and decision-making in dynamic environments.

Article activity feed