A Multimodal Semantic Alignment Framework for Pedestrian Intention Recognition and Trajectory Prediction in Autonomous Driving

Alex W. Chen
Sophie Y. Li
Daniel K Zhang
Michael T. Huang
Emily R. Stone

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Understanding complex pedestrian environments remains a critical challenge for autonomous driving systems. This study introduces a multimodal framework for pedestrian intention recognition by integrating visual and language features. A CLIP encoder is employed to jointly embed in-vehicle camera images and textual labels of traffic scenes. To assess the reliability of recognition results, Bayesian uncertainty modeling is applied. Additionally, an improved Social-GRU network is designed to jointly predict multiple pedestrian trajectories. Experiments conducted on the Waymo dataset and publicly available pedestrian re-identification datasets demonstrate that the proposed framework improves intention classification accuracy by 6.8% and reduces the average displacement error in trajectory prediction by 9.1%.

Version published to 10.21203/rs.3.rs-7467306/v1 on Research Square
Aug 29, 2025

V-PTP-IC: End-to-End Joint Modeling of Dynamic Scenes and Social Interactions for Pedestrian Trajectory Prediction from Vehicle-Mounted Cameras

This article has 3 authors:
1. Siqi Bai
2. Yuwei Fang
3. Hongbing Li
This article has no evaluationsLatest version Oct 15, 2025
Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction

This article has 3 authors:
1. Jiaxi Yang
2. Jiaquan Shen
3. Shitong Wang
This article has no evaluationsLatest version Aug 27, 2025
Graph-structured gravity model enhances transferable pedestrian flow prediction

This article has 3 authors:
1. Meead Saberi
2. Fatemeh Nourmohammadi
3. Taha Hossein Rashidi
This article has no evaluationsLatest version Aug 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

V-PTP-IC: End-to-End Joint Modeling of Dynamic Scenes and Social Interactions for Pedestrian Trajectory Prediction from Vehicle-Mounted Cameras

Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction

Graph-structured gravity model enhances transferable pedestrian flow prediction