A New Paradigm for Human Motion Generation Based on Cross-Modal Nested Alignment

Ethan M. Carter
Sophia L. Hayes
Benjamin T. Walker
Lucas J. Reynolds
Emily K. Foster

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The study addresses the problem of human motion synthesis in the absence of motion capture data. A new paradigm is introduced for motion generation based on cross-modal nested alignment. The method includes a multi-scale semantic alignment module, which models natural language prompts and skeletal motion sequences in a nested manner at both local and global levels. In addition, temporal-spatial structural priors are incorporated to improve motion continuity and semantic accuracy. On the HumanML3D and T2M-Gen datasets, the proposed method improves the motion coverage metric by 12.1%, reduces motion smoothness error by 17.3%, and decreases the average inter-frame drift error by 13.5%. Compared with current mainstream models, it shows higher robustness in handling complex semantic prompts and generating long motion sequences. This study offers a new approach to motion generation driven by cross-modal alignment

Version published to 10.21203/rs.3.rs-7467395/v1 on Research Square
Aug 28, 2025

A Unified Framework for Human Motion Generation with Multimodal Inputs

This article has 4 authors:
1. Nathan J. Blake
2. Isabella M. Cooper
3. Ryan A. Mitchell
4. Chloe S. Turner
This article has no evaluationsLatest version Aug 28, 2025
Language-Driven 3D Skeleton-Based Motion Generation with Action Nesting Graph

This article has 4 authors:
1. Oliver J. Hart
2. Mia L. Franklin
3. Thomas R. Shields
4. Emily K. Dawson
This article has no evaluationsLatest version Aug 29, 2025
Diving Performance Analysis with 3D Motion Knowledge Hypergraphs

This article has 4 authors:
1. Jingbo Wang
2. Yifan Xie
3. Yitao Xie
4. Hongyu Xiao
This article has no evaluationsLatest version Sep 8, 2025

Listed in

Abstract

Article activity feed

Related articles

A Unified Framework for Human Motion Generation with Multimodal Inputs

Language-Driven 3D Skeleton-Based Motion Generation with Action Nesting Graph

Diving Performance Analysis with 3D Motion Knowledge Hypergraphs