A massive, graph augmented, traffic dataset for machine learning and deep learning spatio-temporal traffic analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper introduces a large-scale, high-resolution traffic dataset collected from over 5,000 sensors across the city of Madrid, Spain, spanning a ten-year period, from 2015 to 2024. Comprising more than 1.5 billion records, the dataset includes key traffic metrics such as intensity, occupancy, and average speed, aggregated at 15-minute intervals. It also features two distinct graph-based spatial representations of the sensor network, enabling advanced modelling of urban mobility. The dataset is designed to support a wide range of machine learning tasks, including spatio-temporal forecasting, representation learning, and transfer learning, while also serving broader applications in urban planning and smart city development. Its geographic diversity offers a valuable alternative to existing datasets predominantly sourced from California, enhancing model generalization and reducing regional bias. Additionally, traffic data from this dataset can be used as a covariate in related domains such as air quality control, supporting multi-modal approaches to urban sustainability.