NavAI: A Generalizable LLM Framework for Navigation Tasks in Virtual Reality Environments

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Navigation is a fundamental capability for automated exploration in immersive virtual reality (VR) environments. While prior work has largely focused on path optimization in 360-degree image datasets or in pre-trained 3D simulators, such approaches are difficult to apply directly to general VR applications involving unseen scenes and dynamic interactions. To address this gap, we present NavAI, a general and extensible navigation framework that leverages large language models (LLMs) to support both basic action commands and multi-step goal-oriented navigation in arbitrary VR environments. NavAI decomposes navigation into perception, decision-making, and control execution, and explores multiple model variants that balance effectiveness, efficiency, and robustness. In particular, NavAI establishes an LLM-based navigation baseline and explores optimization strategies for virtual scene understanding and navigation goal decision-making, enabling systematic comparison between the baseline and optimized designs. Evaluation shows that the voting-based Full Mode achieves a 100% success rate across all tested environments and distance regimes, while Gemini-based single-model variants (with and without YOLO) match this effectiveness without ensemble voting. Moreover, YOLO-assisted variants reduce perception latency by up to ≈ 40% and lower token consumption by as much as ≈ 48% compared to LLM-only baselines, whereas Full Mode incurs up to 3.5× higher token cost due to redundant decision-making. Based on these findings, we discuss the limitations of LLM-driven navigation and outline directions for future optimizationand practical deployment in immersive VR systems.

Article activity feed