Evaluation of stochastic trajectory-based epidemic models using the energy score
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Scoring rules are critical for evaluating the predictive performance of epidemic models by quantifying how well their projections and forecasts align with observed data. In this study, we introduce the energy score as a robust performance metric for stochastic trajectory-based epidemic models. As a multivariate extension of the continuous ranked probability score (CRPS), the energy score provides a single, unified measure for time-series predictions. It evaluates both calibration and sharpness by considering the distances between individual trajectories and observed data, as well as the inter-trajectory variability. We provide an overview of how the energy score can be applied to assess both scenario projections and forecasts in this format, with a particular focus on a detailed analysis of the Scenario Modeling Hub results for the 2023-2024 influenza season. By comparing the energy score to the widely used weighted interval score (WIS), we demonstrate its utility as a powerful tool for evaluating epidemic models, especially in scenarios requiring integration of predictions across multiple target outcomes into a single, interpretable metric.
Author summary
Epidemic model predictions are often evaluated using scoring rules, such as the weighted interval score (WIS), which require outputs in interval or quantile formats. However, epidemic models often produce outputs as collections of stochastic trajectories, which are then summarized into quantiles for evaluation. In this study, we introduce the energy score as a scoring metric specifically designed for evaluating stochastic trajectories without requiring conversion to other formats. The energy score provides a rigorous assessment by accounting for both the variability among trajectories and their alignment with observed data. Using publicly available data, we demonstrate that the energy score is a reliable and effective metric for evaluating epidemic model predictions in their native stochastic trajectory format.