Active Learning for Physics-Informed Digital Twins of Integrated Thermal Energy Distribution Systems
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The Thermal Energy Distribution System (TEDS) at Idaho National Laboratory (INL) provides a unique experimental platform for testing advanced supervisory control strategies in hybrid energy systems that combine renewable, nuclear, and thermal energy storage (TES) resources. Real-time control of such systems requires surrogate models that are accurate, interpretable, and uncertainty-aware. This work presents an active learning (AL) framework that integrates high-fidelity Modelica simulations with physics-informed and data-driven surrogates to construct a digital twin (DT) of the TEDS glycol heat exchanger (GHX) subsystem. Four surrogate variants are examined: the deterministic Sparse Identification of Nonlinear Dynamics with Control (SINDyC), its probabilistic multivariate-Gaussian extension (MvG-SINDyC), a feedforward neural network (FNN), and a gated recurrent unit (GRU) network. These models are trained to reproduce the GHX transient dynamics and are compared in terms of predictive accuracy, interpretability, and computational efficiency. The AL loop iteratively selects the most informative simulation trajectories, accelerating convergence and reducing the training demand relative to random sampling. Two model-specific query strategies underpin the framework: Mahalanobis-distance sampling in the coefficient space for MvG-SINDyC and error-based sampling in the prediction space for SINDyC, FNN, and GRU. Across both GHX outputs—the bypass mass flow rate m˙ GHX and heat transfer rate QGHX—AL substantially improves data efficiency, achieving comparable accuracy with as few as one-fifth of the trajectories required by random sampling. Among the evaluated surrogates, the GRU network achieves the highest predictive fidelity, with root mean square errors (RMSE) below 0.003 kg/s and 1 W. The deterministic SINDyC model remains the lightest and fastest to train, while its probabilistic extension (MvG-SINDyC) provides uncertainty quantification through multivariate-Gaussian inference and exhibits the largest computational gains under AL. The FNN surrogate shows overfitting tendencies, particularly without experimental supervision. Overall, the proposed AL-driven workflow offers a scalable pathway for constructing adaptive, interpretable, and uncertainty-aware digital twins for real-time supervisory control of complex energy systems.