Development of a Machine Learning-Based Clustering Framework for Energy Management on a University Campus
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The increasing demand for reliable and efficient energy distribution in educational institutions necessitates the adoption of intelligent energy management systems. This research develops a machine learning-based framework for load management for a university campus using the University of Lagos as a case study due to its metropolitan nature. The study specifically seeks to estimate hourly power consumption for individual buildings and transformers while introducing clustering algorithms to group buildings according to their energy consumption patterns. A dataset comprising 3,648 hourly timestamps across 55 occupied structures was collected over a half-a-year period and analyzed. Building-level load estimation models were first developed to establish hourly consumption profiles. Thereafter, several clustering techniques were evaluated, including K-Means, Hierarchical Clustering, Gaussian Mixture Models (GMM), Spectral Clustering, Mini-Batch K-Means, and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Among these, Mini-Batch K-Means achieved the best performance, segmenting buildings into three optimal groups: high-, medium-, and low-demand clusters. The algorithm achieved a Silhouette Score of 0.461, on a scale of -1 to 1, where higher values indicate more distinct clusters, a Davies–Bouldin Index of 0.767, where lower values represent better clustering, and a Calinski–Harabasz Index of 42.0, where higher scores indicate well separated clusters. Given the duration of the dataset, short‑term load forecasting (STLF) was performed using Meta’s Prophet, Seasonal Autoregressive Integrated Moving Average (SARIMA), Autoregressive Integrated Moving Average (ARIMA), Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models on both the whole‑campus series and on cluster‑specific series. ARIMA produced the lowest point‑forecast errors in all evaluations: whole‑campus Mean Absolute Percentage Error (MAPE) = 7.2% and Root Mean Square Error (RMSE) = 118.7. Furthermore, in cluster-specific metrices, cluster 0 had MAPE = 3.8%; cluster 1 had MAPE = 4.6% (RMSE = 44.7) and cluster 2 had MAPE = 5.4%. This reduction pattern was consistent across all evaluated algorithms. These quantitative results indicate ARIMA as the preferred baseline for point forecasting in this dataset and confirm that consumption‑based clustering is essential to achieve consistent, large reductions in both relative and absolute forecast error. Overall, this study demonstrates the feasibility of applying machine learning for institutional load management, offering a scalable and adaptable framework for other university campuses environments.