A Comparative Analysis of a Multi-Headed Self-Attention Mechanism-Based Transformer Model for Bus Travel Time Prediction Using Heterogeneous Datasets Across Multiple Bus Routes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate prediction of travel time is critical for improving transit service delivery and can potentially increase passenger use and satisfaction. To date, many models developed for predicting bus travel times are limited to small networks due to poor performance in large, densely populated urban areas with complex traffic and long-range dependencies. This study introduces a deep learning-based single-step, multi-station forecasting framework for predicting average bus travel times across multiple routes, stops, and trips using heterogeneous data sources (GTFS records and vehicle probe data) collected over one week in Saint Louis, Missouri. A multi-headed self-attention univariate Transformer Neural Network model was developed to estimate mean hourly travel times. Its performance was compared with Multivariate GRU and LSTM models, as well as Historical Average and XGBoost benchmark models. Using five hours of historical data to predict the next hour, the proposed transformer achieved the lowest minimum and mean MAPE values (4.32% and 8.29%) and performed consistently well during both peak and off-peak periods. While XGBoost delivered the fastest computation time (6.28 seconds), the transformer remained competitive (7.42 seconds) while offering higher accuracy. These results demonstrated the transformer’s suitability and scalability for real-time bus travel time prediction in large, complex urban transit networks.