Statistically Validated Multi-Horizon Electricity Load Forecasting with Weather-Augmented Machine Learning under Walk-Forward Evaluation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Short-term electricity load forecasting is essential for maintaining grid reliability, supporting generation scheduling, and enabling efficient operation of modern energy systems. This study develops a weather-augmented multi-horizon forecasting framework for electricity demand prediction using hourly load observations from the ENTSO-E Transparency Platform combined with meteorological data obtained from the NASA POWER dataset. Four forecasting approaches are evaluated within a unified benchmarking framework: Seasonal Naïve persistence, SARIMAX, Gradient Boosting Regression (GBR), and a weather-augmented GBR variant incorporating exogenous meteorological covariates. In addition, the probabilistic DeepAR neural forecasting model implemented using GluonTS is included as a distributional reference model for uncertainty-aware comparison. Forecast performance is assessed across three operationally relevant prediction horizons (t + 1, t + 24, and t + 168) using an expanding-window rolling-origin walk-forward validation strategy. The feature set includes calendar encodings, autoregressive lag features, rolling demand statistics, and wind-related meteorological indicators. Results demonstrate that Gradient Boosting models consistently outperform statistical and persistence baselines across all forecasting horizons. Horizon-specific Diebold–Mariano tests further indicate that weather augmentation provides statistically significant improvements primarily at longer prediction intervals. Across 17 walk-forward evaluation folds, the best-performing configuration achieved a mean RMSE of 88.44 MW (± 29.62) and a mean MAE of 66.69 MW. Probabilistic evaluation produced empirical 80% prediction interval coverage of 0.759, indicating moderate under-calibration relative to nominal uncertainty levels. These findings highlight the effectiveness of feature-driven ensemble methods for structured electricity demand forecasting and demonstrate the value of statistically validated multi-horizon benchmarking frameworks for operational load prediction under limited-data conditions.