g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks

Zihan Wang
Gim Hee Lee

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We introduce Generalizable 3D-Language Feature Fields (g3D-LF), a 3D representation model pre-trained on large-scale 3D-language dataset for embodied tasks. Our g3D-LF processes posed RGB-D images from agents to encode feature fields for: 1) Novel view representation predictions from any position in the 3D scene; 2) Generations of BEV maps centered on the agent; 3) Querying targets using multi-granularity language within the above-mentioned representations.Our representation can be generalized to unseen environments, enabling real-time construction and dynamic updates. By volume rendering latent features along sampled rays and integrating semantic and spatial relationships through multiscale encoders, our g3D-LF produces representations at different scales and perspectives, aligned with multi-granularity language, via multi-level contrastive learning. Furthermore, we prepare a large-scale 3D-language dataset to align the representations of the feature fields with language. Extensive experiments on Vision-and-Language Navigation under both Panorama and Monocular settings, Zero-shot Object Navigation, and Situated Question Answering tasks highlight the significant advantages and effectiveness of our g3D-LF for embodied tasks. The code is available at https://github.com/MrZihan/g3D-LF.

Version published to 10.32388/841d1t
Dec 4, 2024

Multimodal Supervisory Graphs for PersistentWorld Modeling in Generative AI

This article has 2 authors:
1. Marcus Elvain
2. Howard Pellorin
This article has no evaluationsLatest version Dec 31, 2025
MV-S2CD: A Modality-Bridged Vision Foundation Model-Based Framework for Unsupervised Optical–SAR Change Detection

This article has 8 authors:
1. Yongqi Shi
2. Ruopeng Yang
3. Changsheng Yin
4. Yiwei Lu
5. Bo Huang
6. Yongqi Wen
7. Yihao Zhong
8. Zhaoyang Gu
This article has no evaluationsLatest version Jan 31, 2026
TriORU2-Net++: Attention-Guided Three-StageU2-Net++ for Light Field Occlusion Removal

This article has 5 authors:
1. Mostafa Farouk Senussi
2. Mahmoud Abdalla
3. Mahmoud SalahEldin Kasem
4. Mohamed Mahmoud
5. Hyun-Soo Kang
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multimodal Supervisory Graphs for PersistentWorld Modeling in Generative AI

MV-S2CD: A Modality-Bridged Vision Foundation Model-Based Framework for Unsupervised Optical–SAR Change Detection

TriORU2-Net++: Attention-Guided Three-StageU2-Net++ for Light Field Occlusion Removal