Beyond Attention: Hierarchical Mamba Models for Scalable Spatiotemporal Traffic Forecasting

Zineddine Bettouche
Khalid Ali
Andreas Fischer
Andreas Kassler

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Traffic forecasting in cellular networks is a challenging spatiotemporal prediction problem due to strong temporal dependencies, spatial heterogeneity across cells, and the need for scalability to large network deployments. Traditional cell-specific models incur prohibitive training and maintenance costs, while global models often fail to capture heterogeneous spatial dynamics. Recent spatiotemporal architectures based on attention or graph neural networks improve accuracy but introduce high computational overhead, limiting their applicability in large-scale or real-time settings. We propose HiSTM (Hierarchical SpatioTemporal Mamba), a spatiotemporal forecasting architecture built on state-space modeling. HiSTM combines spatial convolutional encoding for local neighborhood interactions with Mamba-based temporal modeling to capture long-range dependencies, followed by attention-based temporal aggregation for prediction. The hierarchical design enables representation learning with linear computational complexity in sequence length and supports both grid-based and correlation-defined spatial structures. Cluster-aware extensions incorporate spatial regime information to handle heterogeneous traffic patterns. Experimental evaluation on large-scale real-world cellular datasets demonstrates that HiSTM achieves better accuracy, outperforming strong baselines. On the Milan dataset, HiSTM reduces MAE by 29.4% compared to STN, while achieving the lowest RMSE and highest R2 score among all evaluated models. In multi-step autoregressive forecasting, HiSTM maintains 36.8% lower MAE than STN and 11.3% lower than STTRE at the 6-step horizon, with a 58% slower error accumulation rate compared to STN. On the unseen Trentino dataset, HiSTM achieves 47.3% MAE reduction over STN and demonstrates better cross-dataset generalization. A single HiSTM model outperforms 10,000 independently trained cell-specific LSTMs, demonstrating the advantage of joint spatiotemporal learning. HiSTM maintains best-in-class performance with up to 30% missing data, outperforming all baselines under various missing data scenarios. The model achieves these results while being 45× smaller than PredRNNpp, 18× smaller than xLSTM, and maintaining competitive inference latency of 1.19 ms, showcasing its effectiveness for scalable 5/6G traffic prediction in resource-constrained environments.

Version published to 10.3390/network6010011
Feb 13, 2026
Version published to 10.20944/preprints202601.1507.v1
Jan 21, 2026

An AI-Based Temporal-Structural Fusion Framework for Robust Backend Load Prediction in Cloud-Native Environments

This article has 1 author:
1. Yuxi Wang
This article has no evaluationsLatest version Feb 26, 2026
ASGMamba: Adaptive Spectral Gating Mamba for Multivariate Time Series Forecasting

This article has 5 authors:
1. Qianyang Li
2. Xingjun Zhang
3. Shaoxun Wang
4. Jia Wei
5. Yueqi Xing
This article has no evaluationsLatest version Mar 11, 2026
A massive, graph augmented, traffic dataset for machine learning and deep learning spatio-temporal traffic analysis

This article has 3 authors:
1. David Maria-Arribas
2. Juan J. Pantrigo
3. Alfredo Cuesta-Infante
This article has no evaluationsLatest version Feb 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An AI-Based Temporal-Structural Fusion Framework for Robust Backend Load Prediction in Cloud-Native Environments

ASGMamba: Adaptive Spectral Gating Mamba for Multivariate Time Series Forecasting

A massive, graph augmented, traffic dataset for machine learning and deep learning spatio-temporal traffic analysis