V-PTP-IC: End-to-End Joint Modeling of Dynamic Scenes and Social Interactions for Pedestrian Trajectory Prediction from Vehicle-Mounted Cameras
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Pedestrian trajectory prediction from a vehicle-mounted perspective is essential for autonomous driving in complex urban environments yet remains challenging due to ego-motion jitter, frequent occlusions, and scene variability. Existing approaches, largely developed for static surveillance views, struggle to cope with continuously shifting viewpoints. To address these issues, we propose V-PTP-IC, an end-to-end framework that stabilizes motion, models inter-agent interactions, and fuses multi-modal cues for trajectory prediction. The system integrates Simple Online and Realtime Tracking (SORT)-based tracklet augmentation, Scale-Invariant Feature Transform (SIFT)-assisted ego-motion compensation, graph-based interaction reasoning, and multi-head attention fusion, followed by Long Short-Term Memory (LSTM) decoding. Experiments on the JAAD and PIE datasets demonstrate that V-PTP-IC substantially outperforms existing baselines, reducing ADE by 27.23% and 25.73% and FDE by 33.88% and 32.85%, respectively. This advances dynamic scene understanding for safer autonomous systems.