V-PTP-IC: End-to-End Joint Modeling of Dynamic Scenes and Social Interactions for Pedestrian Trajectory Prediction from Vehicle-Mounted Cameras

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Pedestrian trajectory prediction from a vehicle-mounted perspective is essential for autonomous driving in complex urban environments yet remains challenging due to ego-motion jitter, frequent occlusions, and scene variability. Existing approaches, largely developed for static surveillance views, struggle to cope with continuously shifting viewpoints. To address these issues, we propose V-PTP-IC, an end-to-end framework that stabilizes motion, models inter-agent interactions, and fuses multi-modal cues for trajectory prediction. The system integrates Simple Online and Realtime Tracking (SORT)-based tracklet augmentation, Scale-Invariant Feature Transform (SIFT)-assisted ego-motion compensation, graph-based interaction reasoning, and multi-head attention fusion, followed by Long Short-Term Memory (LSTM) decoding. Experiments on the JAAD and PIE datasets demonstrate that V-PTP-IC substantially outperforms existing baselines, reducing ADE by 27.23% and 25.73% and FDE by 33.88% and 32.85%, respectively. This advances dynamic scene understanding for safer autonomous systems.

Article activity feed