Explainable multi-output ensemble learning for early-stage prediction of building heating and cooling loads
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Early-stage design decisions largely determine a building’s operational energy demand and the capacity of heating, ventilation and air-conditioning (HVAC) equipment. This paper presents an interpretable, multi-output machine learning workflow that predicts heating load (HL) and cooling load (CL) from geometric and envelope parameters available in concept design. Using the Energy Efficiency (ENB2012) dataset (768 simulated building configurations), a linear Ridge baseline is compared with Random Forest and Extremely Randomised Trees (Extra Trees) under five-fold cross-validation and a held-out test set. The proposed Extra Trees model achieves test-set RMSE of 0.60 for HL and 1.45 for CL, with coefficients of determination (R²) of 0.996 and 0.977, respectively. Permutation feature importance, partial dependence and scenario-based what-if analysis provide transparent drivers and expected load deltas, enabling rapid option screening before committing to detailed dynamic simulation. Results indicate that overall height, roof area and glazing area dominate predictive accuracy, consistent with building-physics intuition and published studies. Practical Application The workflow provides building-services engineers with fast, auditable estimates of HL and CL from early building information modelling (BIM) or concept parameters. It supports preliminary HVAC capacity sizing, prioritisation of envelope changes (e.g., glazing and height), and rapid screening of alternative concepts before time-consuming dynamic simulation. Explainability outputs (feature importance, sensitivity curves and what-if deltas) can be attached to design reviews to justify decisions and document assumptions.