Enlarging viral mutation estimation: a view from the distribution of mutation rates
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The problem of empirical estimation of mutation rates is fundamental for the understanding of viral evolution. The estimation of viral mutation rates is based on varied and often complex methods carried out through experiments essentially designed to count mutation frequencies. Mutation rates are defined as the probabilities of nucleotide substitutions, typically reported as a single number in units of mutation (substitution) per base (nucleotide) per replication cycle or per cell infection, depending on the replication mode of the virus. Even more, the uncertainty quantification of these estimates is so difficult that it is rare to find it reported in the literature. The values for the same virus reported in literature fall within a broad range, sometimes spanning two orders of magnitude. For instance, the mutation rates range from 10 − 8 to 10 − 6 mutation per base per cell infection for DNA viruses and from 10 − 6 to 10 − 4 mutation per base per cell infection for RNA viruses. In this paper, we propose an alternative perspective on the estimation of mutational rates, which avoids the use of consensus sequences and/or serial passages. Our approach leverages the large amount of sequencing data produced by high throughput sequencing technologies coupled to an experimental design that performs a single replication cycle from an initial clonal viral population. We propose to replace the single numeric mutation rate with a distribution of mutation rates (DMR), together with a procedure to implement the estimation of this distribution from sequencing data and show that it can be estimated from sequencing data. Even though the focus of this paper is the development of the approach centered on the DMR it is straightforward to produce point and interval estimates of the mutation rates, including uncertainty quantification. In addition to the estimation of the DMR, we provide a theoretical characterization of it, as being well-approximated by a log-normal distribution. Finally, we study some non-trivial properties of the DMR related to a remarkable invariance under down-scaling the distribution from the genome to its subunits.