Mind the Baseline: The Hidden Impact of Reference Model Selection on Forecast Assessment

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Baseline models are essential reference points for evaluating forecasting methods, yet their selection often receives insufficient attention. We present a systematic framework for baseline model selection in epidemiological forecasting, establishing criteria for suitable baselines and demonstrating the consequences of different choices. Analysing data from COVID-19 and influenza forecast hubs, we evaluated ten baseline model frameworks. Our results reveal that baseline selection profoundly impacts forecast evaluation: for influenza, the proportion of models outperforming the baseline ranged from 10% to 93% depending on the baseline chosen. No single baseline satisfied all evaluation criteria. The choice of baseline also affected model rankings, with some baselines producing substantially different orderings of forecast model performance. We found that well-calibrated baselines do not necessarily align with good forecast performance, highlighting a fundamental tension in baseline selection. These findings highlight the need for careful baseline selection in forecast evaluation, particularly in collaborative efforts where fair comparison across multiple models is essential. We provide practical recommendations for baseline selection and suggest strategies for improving evaluation fairness when ideal baselines cannot be identified.

Article activity feed