NavAI: A Generalizable LLM Framework for Navigation Tasks in Virtual Reality Environments
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Navigation is a fundamental capability for automated exploration in immersive virtual reality (VR) environments. While prior work has largely focused on path optimization in 360-degree image datasets or in pre-trained 3D simulators, such approaches are difficult to apply directly to general VR applications involving unseen scenes and dynamic interactions. To address this gap, we present NavAI, a general and extensible navigation framework that leverages large language models (LLMs) to support both basic action commands and multi-step goal-oriented navigation in arbitrary VR environments. NavAI decomposes navigation into perception, decision-making, and control execution, and explores multiple model variants that balance effectiveness, efficiency, and robustness. In particular, NavAI establishes an LLM-based navigation baseline and explores optimization strategies for virtual scene understanding and navigation goal decision-making, enabling systematic comparison between the baseline and optimized designs. Evaluation shows that the voting-based Full Mode achieves a 100% success rate across all tested environments and distance regimes, while Gemini-based single-model variants (with and without YOLO) match this effectiveness without ensemble voting. Moreover, YOLO-assisted variants reduce perception latency by up to ≈ 40% and lower token consumption by as much as ≈ 48% compared to LLM-only baselines, whereas Full Mode incurs up to 3.5× higher token cost due to redundant decision-making. Based on these findings, we discuss the limitations of LLM-driven navigation and outline directions for future optimizationand practical deployment in immersive VR systems.