Enhancing Open-Vocabulary Scene Understanding via Push-Pull Alignment in Gaussian Splatting

Tong Chen
Shengjia Liang
Yuan Xiong
Qiang Zhou
Qichuan Geng
Zhong Zhou

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Open-vocabulary scene understanding based on 3D Gaussian Splatting(3DGS) has shown promising potential for applications such as embodiedagents and object localization. By integrating open-vocabulary embeddings intospatial 3D Gaussians, these models enable a more comprehensiveunderstanding of scenes. However, existing methods often suffer frommisalignment due to the gap between RGB and language modalities, leading toincorrect interpretations of similar-looking objects. To address this issue, wepropose a cross-modal integration approach that aligns multiplerepresentations through spatial Gaussian positioning. We introduce PPGS, anovel bimodal framework that bridges RGB and language modalities throughcohesive representation fields. Leveraging the illumination-invariant propertiesof language embeddings, we design the Bridge module, which employs surfacereconstruction to provide refined geometric positions, acting as a link betweenmodalities. This module significantly enhances cross-modal alignment,improves high-fidelity rendering, and ensures accurate language featureembeddings for better modality fusion. Furthermore, our frameworkdynamically adjusts gradients based on the distinct optimization requirementsof RGB and language during joint learning, ensuring stable and efficientconvergence. Comprehensive experiments demonstrate that PPGS achievessuperior language query accuracy and enhanced visual quality compared toexisting language-embedded representations, with Intersection over Union(mIoU) increasing by 6% and Peak Signal-to-Noise Ratio (PSNR) showing gainsover mainstream methods, all within only 50% of the training time. Code repository: https://github.com/flybiubiu/PPGS.

Version published to 10.21203/rs.3.rs-6513296/v1 on Research Square
May 19, 2025

Open-Vocabulary 3D Understanding with Identity-Enhanced Segmentation

This article has 5 authors:
1. Weijie Lin
2. Wei Xiang
3. Lu Yu
4. Tianyu Chen
5. Kang Han
This article has no evaluationsLatest version May 7, 2025
Place Recognition Meet Multiple Modalities: A Comprehensive Review, Current Challenges and Future Development

This article has 4 authors:
1. Zhenyu Li
2. Tianyi Shang
3. Pengjie Xu
4. Zhaojun Deng
This article has no evaluationsLatest version Jun 17, 2025
GMS-JIGNet: Guided Multi-Scale Jigsaw Puzzles for Self-Supervised Artificial Spot Segmentation in Fundus Photography

This article has 3 authors:
1. Jaehan Joo
2. Hunyoul Lee
3. Suk Chan Kim
This article has no evaluationsLatest version May 16, 2025

Listed in

Abstract

Article activity feed

Related articles

Open-Vocabulary 3D Understanding with Identity-Enhanced Segmentation

Place Recognition Meet Multiple Modalities: A Comprehensive Review, Current Challenges and Future Development

GMS-JIGNet: Guided Multi-Scale Jigsaw Puzzles for Self-Supervised Artificial Spot Segmentation in Fundus Photography