Intrahost dynamics, together with genetic and phenotypic effects predict the success of viral mutations
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Predicting the fitness of mutations in the evolution of pathogens is a long-standing and important, yet largely unsolved problem. In this study, we used SARS-CoV-2 as a model system to explore whether the intrahost diversity of viral infections could provide clues on the relative fitness of single amino acid variants (SAVs). To do so, we analysed ~15 million complete genomes and nearly ~8000 sequencing libraries generated from SARS-CoV-2 infections, which were collected at various timepoints during the COVID-19 pandemic. Across timepoints, we found that many successful SAVs were detected in the intrahost diversity of samples collected prior, with a median of 6-40 months between the initial collection dates of samples and the highest frequency seen for these SAVs. Additionally, we found that the co-occurrence of intrahost SAVs significantly captures genetic linkage patterns observed at the interhost level (Pearson's r=0.28-0.45, all p<0.0001). Further, we show that machine learning models can learn highly generalisable intrahost, physiochemical and phenotypic patterns to forecast the future fitness of intrahost SAVs (r2=0.48-0.63). Most of these models performed significantly better when considering genetic linkage (r2=0.53-0.68). Overall, our results document the evolutionary forces shaping the fitness of mutations, which may offer potential to forecast the emergence of future variants and ultimately inform the design of vaccine targets.