Tracing SARS-CoV-2 Clusters Across Local Scales Using Genomic Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Quantitatively understanding local transmission dynamics is essential for designing effective prevention strategies. In this study, we developed a novel algorithm to identify introductions and trace locally circulating clusters. We analyzed over 26,000 SARS-CoV-2 genomes and their associated metadata, collected between January and October 2021, to explore introduction and dispersal patterns in Greater Houston, a major metropolitan area known for its demographic diversity. Our analysis identified more than 1,000 independent introduction events, resulting in clusters of varying sizes. Earlier clusters were generally larger and posed greater challenges for control efforts. Characterization of introduction sources revealed that domestic origins were more significant than international ones. Additionally, analysis of locally circulating clusters highlighted age-structured transmission dynamics. Geographic reconstruction of cluster spread identified Harris County as the primary viral source for surrounding counties. Harris county sustained the local epidemic with a smaller proportion of new cases driven by external importations and longer persistence times of circulating lineages. Overall, our high-resolution spatiotemporal reconstruction of the epidemic in Greater Houston provides critical insights into the heterogeneous transmission landscape, supporting regional response strategies and public health planning.
Significance Statement
The growing recognition of genome sequencing as a critical tool for outbreak response has driven a rapid increase in the availability of sequence data. Here, we present an analytical workflow to trace imported SARS-CoV-2 clusters using large-scale genome datasets. Our approach pinpoints when, where, and how many introductions occurred, while also tracking the circulation of resulting clusters. By incorporating metrics such as the Source Sink Score, Local Import Score, and Persistence Time, our analysis reveals transmission heterogeneity between subregions of the focal area. These insights are essential for monitoring viral introductions and guiding targeted control measures, enhancing the ability of local responders to address the challenges of current and future pandemics as new variants emerge.