Robustness of Selection and Timing Inference under Model Variation in Population Genetics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In population genetics, accurately inferring the selection coefficient and the time of onset of advantageous mutations from genetic data is fundamental for understanding evolutionary processes. Here, we investigate how mismatches between the true evolutionary process and the inference model—specifically in the reproductive variance ( σ 2 ) and the number of generations ( L )—affect the posterior distributions of the selection coefficient and the time of onset. Using the Kolmogorov forward and backward equations, we model the stochastic dynamics of gene frequencies under selection and drift. We show that while the posterior distribution of the selection coefficient remains unaffected by changes in σ 2 and L , this invariance does not apply to the time of onset. By framing the problem as a first passage time issue, we derive explicit expressions for the offsets in the posterior mean and variance of the time of onset that result from incorrect assumptions about σ 2 and L . Our analysis reveals that these offsets are related to the mean and variance of the first passage time required for the allele frequency to reach a certain threshold, starting from an initial frequency determined by the model parameters. Under the assumption of a uniform prior for the time of onset, we find that the offset in the inferred mean is given by the difference in the effective generation duration (Δ = 1 /σ 2 ) between the true process and the inference model. We validate our theoretical findings through simulations, demonstrating that the empirical offsets closely match our predictions. Furthermore, we generalize our results to accommodate non-uniform prior distributions, such as exponential priors, and provide numerical methods for calculating offsets under arbitrary priors. Stochastic fluctuations due to genetic drift, which are influenced by the reproductive variance and generational structure, can introduce significant biases in the posterior distribution of time of onset of advantageous mutations. By quantifying these biases, our framework enables more accurate adjustments to inferences drawn from genetic data, thereby enhancing our understanding of evolutionary dynamics and improving the reliability of population genetic analyses.