Learning to Route in Time and Frequency Domains: A Dual-Domain MoE Transformer for Multi-Horizon Forecasting
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate long-term electrical load forecasting is essential for reliable smart grid operation, but remains challenging due to multi-scale periodic behaviors and non-stationary temporal variations across different prediction horizons. This paper proposes MoE-Transformer, a dual-domain forecasting framework that learns to route representations in both time and frequency domains through reinforcement learning. To address spectral misalignment in multi-step prediction, an Extended Discrete Fourier Transform (Extended DFT) is introduced to align the input spectrum with the frequency grid of the full forecasting window. The model integrates parallel Mixture-of-Experts modules in the time and frequency domains (T-MoE and F-MoE), where domain-specific experts are responsible for modeling complementary temporal dynamics and spectral structures. Expert routing in each domain is formulated as an independent Markov Decision Process and optimized via reinforcement learning to jointly account for forecasting accuracy, routing consistency, and balanced expert utilization. Extensive experiments on five benchmark datasets, including ETTh1, Electricity, and Traffic, across four forecasting horizons demonstrate that MoE-Transformer consistently outperforms state-of-the-art methods, achieving MSE reductions of 50.9–56.9%. The use of sparse expert activation reduces memory consumption by 40% and inference latency by 60%, indicating suitability for real-time forecasting scenarios. Ablation studies further validate the individual contributions of Extended DFT, dual-domain modeling, and reinforcement-based routing, leading to performance improvements of 5.8%, 4.6%, and up to 47.2%, respectively.