Fundamental restriction on epistasis detection and fitness valleys in virus evolution
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Probabilistic prognosis of virus evolution, vital for the design of effective vaccines and antiviral drugs, requires the knowledge of adaptive landscape including epistatic interactions. Although epistatic interactions can, in principle, be inferred from abundant sequencing data, fundamental limitations on their detection imposed by genetic linkage between evolving sites obscure their signature and require averaging over many independent populations. We probe the limits of detection based on pairwise correlations conditioned on the state of a third site on synthetic sequences evolved in a Monte Carlo algorithm with known epistatic pairs. Results demonstrate that the detection error decreases with the number of independent populations and increases with the sequence length. The accuracy is enhanced by moderate recombination and is maximal, when epistasis magnitude approaches the point of full compensation. The method is applied to several thousands of sequences of SARS-CoV-2 sampled in three different ways. Results obtained under equal sampling from world regions imply the existence of fitness valleys connecting groups of viral variants.
SIGNIFICANCE
The few epistatic pairs of genomic sites hide in genomic data among numerous random correlations caused by common phylogenetic history. We test a method of epistasis detection designed to compensate for this noise. The accuracy is tested using synthetic sequences generated by a Monte Carlo algorithm with known epistatic pairs. The method is applied to several thousands of sequences of SARS-CoV-2 sampled in three different ways. Results obtained under equal sampling from world regions imply the existence of fitness valleys connecting groups of viral variants.