NavAI: A Generalizable LLM Framework for Navigation Tasks in Virtual Reality Environments

Jiajie Wang
Sumesh Surendran Letha
Matthew DiGiovanni
Kebin Peng
Sen He

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Navigation is a fundamental capability for automated exploration in immersive virtual reality (VR) environments. While prior work has largely focused on path optimization in 360-degree image datasets or in pre-trained 3D simulators, such approaches are difficult to apply directly to general VR applications involving unseen scenes and dynamic interactions. To address this gap, we present NavAI, a general and extensible navigation framework that leverages large language models (LLMs) to support both basic action commands and multi-step goal-oriented navigation in arbitrary VR environments. NavAI decomposes navigation into perception, decision-making, and control execution, and explores multiple model variants that balance effectiveness, efficiency, and robustness. In particular, NavAI establishes an LLM-based navigation baseline and explores optimization strategies for virtual scene understanding and navigation goal decision-making, enabling systematic comparison between the baseline and optimized designs. Evaluation shows that the voting-based Full Mode achieves a 100% success rate across all tested environments and distance regimes, while Gemini-based single-model variants (with and without YOLO) match this effectiveness without ensemble voting. Moreover, YOLO-assisted variants reduce perception latency by up to ≈ 40% and lower token consumption by as much as ≈ 48% compared to LLM-only baselines, whereas Full Mode incurs up to 3.5× higher token cost due to redundant decision-making. Based on these findings, we discuss the limitations of LLM-driven navigation and outline directions for future optimizationand practical deployment in immersive VR systems.

Version published to 10.21203/rs.3.rs-9034054/v1 on Research Square
Mar 23, 2026

Enabling Safe UAV Navigation in Transparent and Specular Environments via Generative Depth Completion

This article has 7 authors:
1. Boyu Zhou
2. Pengcheng Zhu
3. Xulin Xiao
4. Hao Hu
5. Wei Pan
6. Huaxu Li
7. Qingkai Yang
This article has no evaluationsLatest version Apr 15, 2026
Active Vision for Social Navigation

This article has 2 authors:
1. Jack M. Vice
2. Gita Sukthankar
This article has no evaluationsLatest version Apr 3, 2026
Active Vision for Social Navigation

This article has 2 authors:
1. Jack M. Vice
2. Gita Sukthankar
This article has no evaluationsLatest version Apr 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enabling Safe UAV Navigation in Transparent and Specular Environments via Generative Depth Completion

Active Vision for Social Navigation

Active Vision for Social Navigation