Spatiotemporal structure of SARS-CoV-2 mutational frequencies in wastewater samples from Ontario
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Starting October 2021, the Ontario wastewater surveillance initiative has used next-generation sequencing (NGS) to monitor SARS-CoV-2 RNA in wastewater samples. The fragmented and heterogeneous nature of these data precludes using comparative methods that require full-length genome sequences. In this study, we investigate the utility of the inner product of the vectors of mutation frequencies to quantify the temporal and spatial structure of these data. Raw sequence data were trimmed and mapped to the SARS-CoV-2 reference genome to extract mutation frequencies and coverage statistics. These data were filtered for samples with incomplete metadata, positions with insufficient coverage (>100 reads), or mutations with frequencies below 1%. For every pair of samples, we calculated the inner product D(x,y) of the respective mutation frequency vectors x and y , and normalized by √D(x,x)D(y,y). In total, we processed 1,619 samples from October 2021 to June 2023. The average depth was 7,693 reads, with mean coverage of 24,853 nt. A total of 241,078 mutations were detected in these samples. We restricted our analysis to 20 consecutive months with samples from at least one health region per month. A projection of the resulting distance matrix revealed substantial temporal structure largely driven by the rapid spread of variants of concern. Genetic similarity, as quantified by the normalized dot product of mutation frequencies, was significantly negatively correlated with the geographic distance between sampling locations. These results suggest that spatial differentiation in the genomic variation of SARS-CoV-2 among wastewater samples can be measured, even at the relatively small scale of a single province.