Surveillance sentinel selection strategy based on emergence probability and network topology
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Although disease emergence probability has been shown to vary geographically, how this spatial heterogeneity can be leveraged to optimize sentinel site selection for early outbreak detection on complex networks remains largely unexplored.
Methods
We simulated outbreaks on synthetic modular networks varying in topology, emergence probability distribution, and pathogen transmissibility, and quantified early detection performance as the average reduction in outbreak size when detected by a given sentinel set. A genetic algorithm (GA) was first used to identify optimal sentinel sets and understand characteristics potentially relevant to early detection performance. These characteristics were then used to train a Random Forest-based Surrogate Model (RFSM) to predict the relative ranking of node selections and assess the relative importance of different network features. We further benchmarked RFSM against five alternative strategies on synthetic scale-free and empirical networks to evaluate its generalizability. Finally, sensitivity analyses were conducted to examine how feature importance patterns varied with network size, mean degree, degree heterogeneity, within-module connection probability, kurtosis of emergence probability distribution, degree-probability correlations, and basic reproduction number.
Results
Surveillance strategies incorporating emergence probability outperformed those based solely on network topology, though the improvement was modest. RFSM achieved comparable performance to GA and greedy algorithm, but the computational time required is only about 1/24,000 of GA’s and 1/100 of the greedy algorithm for a network with 200 nodes. Dynamic selection features capturing the overlap among sentinel sites were the most important for early detection performance, followed by global topology-related features, such as network density and degree distribution skewness, and node topology-related features, such as degree, betweenness, and eigenvector centrality of the node. The importance of emergence probability-related features increased with greater degree heterogeneity, higher kurtosis of the emergence probability distribution, and negative node degree and emergence probability correlation. Selecting only six sentinels achieved approximately 90% of the performance of full-network surveillance.
Conclusions
Although heterogeneity in emergence probability affects optimal sentinel selection, early outbreak detection is governed primarily by network structural characteristics, particularly those related to dynamic sentinel node selection, underscoring the need for adaptable surveillance designs. This research provides a computationally efficient framework and an online tool for designing disease surveillance networks, especially under resource-limited settings.