A population genetics model explaining overdispersion in active transposable elements
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The number of transposable elements (TEs) per host genome varies within natural populations, with variance much greater than the mean. This pattern, known as overdispersion, conflicts with classical population genetic models based on the Poisson distribution, which predict equal mean and variance. To address this gap, we develop a stochastic model of TE dynamics using a bi-parental Moran process with recombination that explicitly accounts for core evolutionary forces: transposition, excision, and purifying selection. From this model, we derive analytical expressions for the mean and variance of the TE copy number. Our results show that overdispersion arises naturally when the transposition rate exceeds the product of the selection coefficient and the mean copy number, and that overdispersion increases with higher transposition rates. Additionally, we show that maintaining positive TE copy numbers at equilibrium, and thus sustaining overdispersion, requires a net transposition rate below approximately 0.5 insertions per copy per generation, a constraint satisfied by observed TE families to maintain genome stability. The derived overdispersion also accounts for the right-skewed, heavy-tailed distribution of copy numbers, capturing features that classical models fail to account for. A qualitative comparison of these predictions with data from 18 active TE families in 85 Drosophila melanogaster strains confirms these patterns: all active TEs in the data exhibited overdispersion, with variances 2–10 times the mean and the distribution showing positive, or right skewness. Collectively, our findings reveal that TE distributions deviate from Poisson expectations and establish overdispersion as an inherent feature of TE population dynamics, providing a mechanistic framework for understanding the full distributional properties of TE copy numbers.