Dynamic4D: Enhancing Self-Supervised Learning for Robust and Fine-Grained 4D Point Cloud Video Understanding

Mingxuan Du
Yutian Zeng

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The proliferation of 4D point cloud videos highlights their potential, but the high cost of obtaining large-scale annotated data severely limits supervised methods. Consequently, self-supervised learning (SSL) is vital for learning generalizable representations from unlabeled 4D data. While existing SSL frameworks, such as Uni4D, have made progress, they often struggle with fine-grained motion understanding in extremely dynamic scenes, maintaining robustness under severe occlusion, and developing explicit predictive capabilities. To address these, we propose Dynamic4D, a novel and robust self-supervised framework tailored for dynamic 4D point cloud understanding. Dynamic4D introduces an Adaptive Causal Temporal Attention (ACTA) mechanism in the encoder for explicit causal temporal modeling and dynamic region-focused learning. Its decoder employs Motion Prediction Tokens (MPT) to directly infer motion vectors for masked regions. A novel adaptive motion-sensitive masking strategy further enhances robustness by intelligently prioritizing high-dynamic zones. Our multi-objective pre-training strategy integrates a new Dynamic Perception Loss alongside geometric reconstruction and latent-space alignment. Extensive experiments on diverse challenging benchmarks demonstrate that Dynamic4D consistently achieves state-of-the-art performance. It substantially outperforms prior methods, validating its superior capacity to learn highly robust, generalizable, and motion-aware representations for complex dynamic 4D point cloud scenes.

Version published to 10.20944/preprints202603.1381.v1
Mar 17, 2026

DAFSDet: Dual-Attention Guided Few-Shot Object Detection in Remote Sensing Images

This article has 6 authors:
1. Guangshuai Gao
2. Zhilin Zhang
3. Wei Zhang
4. Yunqi Shang
5. Yan Dong
6. Jiangtao Xi
This article has no evaluationsLatest version Feb 12, 2026
Attention-Weighted Hierarchical Decoding for Few-Shot Semantic Segmentation: A Case Study on Batik Cultural Heritage Patterns

This article has 3 authors:
1. Yuzhou Ma
2. Haolong Qian
3. Wei Li
This article has no evaluationsLatest version Mar 17, 2026
PE-OWOD: Parameter-Efficient Open-World Detection with Semantic Priors and Virtual Outlier Synthesis

This article has 6 authors:
1. Jiaming Gu
2. Yehui Zheng
3. Yuzhou Liu
4. Caimei Liu
5. Shu Gong
6. Luoyang Luo
This article has no evaluationsLatest version Feb 24, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DAFSDet: Dual-Attention Guided Few-Shot Object Detection in Remote Sensing Images

Attention-Weighted Hierarchical Decoding for Few-Shot Semantic Segmentation: A Case Study on Batik Cultural Heritage Patterns

PE-OWOD: Parameter-Efficient Open-World Detection with Semantic Priors and Virtual Outlier Synthesis