Unsupervised Segmentation and Alignment of Multi‑Demonstration Trajectories via Multi‑Feature Saliency and Duration‑Explicit HSMMs

Tianci Gao
Konstantin A. Neusypin
Dmitry D. Dmitriev
Bo Yang
Shengren Rao

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Learning from demonstration with multiple executions must contend with time warping, sensor noise, and alternating quasi stationary and transition phases. We propose a la-bel free pipeline that couples unsupervised segmentation, duration explicit alignment, and probabilistic encoding. A dimensionless multi feature saliency (velocity, acceleration, curvature, direction change rate) yields scale robust keyframes via persistent peak–valley pairs and non maximum suppression. A hidden semi Markov model (HSMM) with ex-plicit duration distributions is jointly trained across demonstrations to align trajectories on a shared semantic time base. Segment level probabilistic motion models (GMM/GMR or ProMP, optionally combined with DMP) produce mean trajectories with calibrated co-variances, directly interfacing with constrained planners. Feature weights are tuned without labels by minimizing cross demonstration structural dispersion on the simplex via CMA ES. Across UAV flight, autonomous driving, and robotic manipulation, the method reduces phase boundary dispersion by 31% on UAV Sim and by 30–36% under monotone time warps, noise, and missing data (vs. HMM); improves the sparsity–fidelity trade off (higher time compression at comparable reconstruction error) with lower jerk; and attains nominal 2σ coverage (94–96%), indicating well calibrated uncertainty. Abla-tions attribute the gains to persistence plus NMS, weight self calibration, and dura-tion explicit alignment. The framework is scale aware and computationally practical, and its uncertainty outputs feed directly into MPC/OMPL for risk aware execution.

Version published to 10.20944/preprints202508.1598.v1
Aug 22, 2025

TransMODAL: A Dual-Stream Transformer with Adaptive Co-Attention for Efficient Human Action Recognition

This article has 3 authors:
1. Majid Joudaki
2. Mehdi Imani
3. Hamid R. Arabnia
This article has no evaluationsLatest version Aug 21, 2025
A Spatiotemporal Bidirectional Mamba Network with Global–Local Skeletal Enhancement for 3D Human Pose Estimation

This article has 5 authors:
1. Chuhan Wu
2. Zan Wang
3. Guixian Zhou
4. Jiahao Hua
5. Lianke Shi
This article has no evaluationsLatest version Sep 4, 2025
A Robust Framework Fusing Visual SLAM and 3D Gaussian Splatting with a Coarse-Fine Method for Dynamic Regions Segmentation

This article has 3 authors:
1. Zhian Chen
2. Yaqi Hu
3. Yong Liu
This article has no evaluationsLatest version Aug 25, 2025

Listed in

Abstract

Article activity feed

Related articles

TransMODAL: A Dual-Stream Transformer with Adaptive Co-Attention for Efficient Human Action Recognition

A Spatiotemporal Bidirectional Mamba Network with Global–Local Skeletal Enhancement for 3D Human Pose Estimation

A Robust Framework Fusing Visual SLAM and 3D Gaussian Splatting with a Coarse-Fine Method for Dynamic Regions Segmentation