Statistically Validated Multi-Horizon Electricity Load Forecasting with Weather-Augmented Machine Learning under Walk-Forward Evaluation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Short-term electricity load forecasting is essential for maintaining grid reliability, supporting generation scheduling, and enabling efficient operation of modern energy systems. This study develops a weather-augmented multi-horizon forecasting framework for electricity demand prediction using hourly load observations from the ENTSO-E Transparency Platform combined with meteorological data obtained from the NASA POWER dataset. Four forecasting approaches are evaluated within a unified benchmarking framework: Seasonal Naïve persistence, SARIMAX, Gradient Boosting Regression (GBR), and a weather-augmented GBR variant incorporating exogenous meteorological covariates. In addition, the probabilistic DeepAR neural forecasting model implemented using GluonTS is included as a distributional reference model for uncertainty-aware comparison. Forecast performance is assessed across three operationally relevant prediction horizons (t + 1, t + 24, and t + 168) using an expanding-window rolling-origin walk-forward validation strategy. The feature set includes calendar encodings, autoregressive lag features, rolling demand statistics, and wind-related meteorological indicators. Results demonstrate that Gradient Boosting models consistently outperform statistical and persistence baselines across all forecasting horizons. Horizon-specific Diebold–Mariano tests further indicate that weather augmentation provides statistically significant improvements primarily at longer prediction intervals. Across 17 walk-forward evaluation folds, the best-performing configuration achieved a mean RMSE of 88.44 MW (± 29.62) and a mean MAE of 66.69 MW. Probabilistic evaluation produced empirical 80% prediction interval coverage of 0.759, indicating moderate under-calibration relative to nominal uncertainty levels. These findings highlight the effectiveness of feature-driven ensemble methods for structured electricity demand forecasting and demonstrate the value of statistically validated multi-horizon benchmarking frameworks for operational load prediction under limited-data conditions.

Article activity feed