Gaussian Process Emulation for Exploring Complex Infectious Disease Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Epidemiological models that aim for a high degree of biological realism by simulating every individual in a population are unavoidably complex, with many free parameters, which makes systematic explorations of their dynamics computationally challenging. This study investigates the potential of Gaussian Process emulation to overcome this obstacle. To simulate disease dynamics, we developed an abstract individual-based model that is loosely inspired by dengue, incorporating some key features shaping dengue epidemics such as social structure, human movement, and seasonality. We trained three Gaussian Process surrogate models on three outcomes: outbreak probability, maximum incidence, and epidemic duration. These surrogate models enable the rapid prediction of outcomes at any point in the eight-dimensional parameter space of the original model. Our analysis revealed that average infectivity and average human mobility are key drivers of these epidemiological metrics, while the seasonal timing of the first infection can influence the course of the epidemic outbreak. We use a dataset comprising more than 1,000 dengue epidemics observed over 12 years in Colombia to calibrate our Gaussian Process model and evaluate its predictive power. The calibrated Gaussian Process model identifies a subset of municipalities with consistently higher average infectivity estimates, which show notable overlap with previously reported dengue disease clusters, suggesting that statistical emulation can facilitate empirical data analysis. Overall, this work underscores the potential of Gaussian Process emulation to enable the use of more complex individual-based models in epidemiology, allowing a higher degree of realism and accuracy that should increase our ability to control diseases of public health concern.
Author Summary
Detailed individual-based models can capture a high degree of realism, but their complexity often makes them too slow or cumbersome to explore fully. In our work, we explore how Gaussian Process emulation — a statistical method for building fast, accurate surrogate models — can help overcome this challenge. First, we developed an individual-based model that simulates disease spread in a population, accounting for features such as social structure, human mobility, and seasonal variation in infection risk. We then trained a Gaussian Process surrogate model on the outputs of this individual-based model, which allowed us to predict key outcomes almost instantly across a wide range of parameter values. This approach made it possible to systematically explore which factors drive simulated epidemics. We found that two variables — average infectivity and average mobility — had the greatest influence on whether and how outbreaks occurred. Our results demonstrate that Gaussian Process emulation offers a practical and powerful way to study complex disease systems. While we applied this approach to infectious disease transmission, the underlying method can be useful for analyzing many other types of detailed, simulation-based models.