A massive, graph augmented, traffic dataset for machine learning and deep learning spatio-temporal traffic analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper introduces a large-scale, high-resolution traffic dataset collected from over 5,000 sensors across the city of Madrid, Spain, spanning a ten-year period, from 2015 to 2024. Comprising more than 1.5 billion records, the dataset includes key traffic metrics such as intensity, occupancy, and average speed, aggregated at 15-minute intervals. It also features two distinct graph-based spatial representations of the sensor network, enabling advanced modelling of urban mobility. The dataset is designed to support a wide range of machine learning tasks, including spatio-temporal forecasting, representation learning, and transfer learning, while also serving broader applications in urban planning and smart city development. Its geographic diversity offers a valuable alternative to existing datasets predominantly sourced from California, enhancing model generalization and reducing regional bias. Additionally, traffic data from this dataset can be used as a covariate in related domains such as air quality control, supporting multi-modal approaches to urban sustainability.

Article activity feed