V-PTP-IC: End-to-End Joint Modeling of Dynamic Scenes and Social Interactions for Pedestrian Trajectory Prediction from Vehicle-Mounted Cameras

Siqi Bai
Yuwei Fang
Hongbing Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Pedestrian trajectory prediction from vehicle-mounted cameras is a safety-critical capability in intelligent transportation systems and autonomous driving, particularly in highly dynamic and visually complex urban traffic. In such scenarios, ego-motion–induced jitter, frequent occlusions, and diverse background motions jointly challenge the modeling of both dynamic scene and social interactions, both of which are equally critical for forecasting future trajectories. Existing approaches, often developed for fixed-camera or surveillance setups, lack robustness to these dynamic driving conditions. We present V-PTP-IC (Vehicle-view Pedestrian Trajectory Prediction with Interaction Considerations), an end-to-end framework that jointly models dynamic scene and social interactions. The framework employs SORT-based multi-object tracking to initialize pedestrian trajectories and SIFT-based static keypoint matching for ego-motion compensation and trajectory stabilization. A VGG19-based dynamic scene encoder captures evolving environmental layouts, while a Social-LSTM module models spatiotemporal dependencies among pedestrians. A unified feature fusion strategy balances both modalities to generate accurate, diverse, and socially compliant trajectory forecasts. Extensive experiments on the JAAD in-vehicle dataset demonstrate that V-PTP-IC reduces the average displacement error (ADE) by 22.2 and the final displacement error (FDE) by 25.8 compared with state-of-the-art baselines. These results confirm the framework’s ability to balance prediction accuracy, diversity, and robustness, offering a scalable solution for autonomous driving in dynamically changing real-world environments.

Version published to 10.20944/preprints202510.1158.v1
Oct 15, 2025

A Multimodal Semantic Alignment Framework for Pedestrian Intention Recognition and Trajectory Prediction in Autonomous Driving

This article has 5 authors:
1. Alex W. Chen
2. Sophie Y. Li
3. Daniel K Zhang
4. Michael T. Huang
5. Emily R. Stone
This article has no evaluationsLatest version Aug 29, 2025
Object Tracking: A Comprehensive Survey From Classical Approaches to Large Vision-Language and Foundation Models

This article has 4 authors:
1. Rahul Raja
2. Arpita Vats
3. Omkar Thawakar
4. Tajamul Ashraf
This article has no evaluationsLatest version Oct 6, 2025
Object Tracking: A Comprehensive Survey From Classical Approaches to Large Vision-Language and Foundation Models

This article has 4 authors:
1. Rahul Raja
2. Arpita Vats
3. Omkar Thawakar
4. Tajamul Ashraf
This article has no evaluationsLatest version Oct 6, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Multimodal Semantic Alignment Framework for Pedestrian Intention Recognition and Trajectory Prediction in Autonomous Driving

Object Tracking: A Comprehensive Survey From Classical Approaches to Large Vision-Language and Foundation Models

Object Tracking: A Comprehensive Survey From Classical Approaches to Large Vision-Language and Foundation Models