Identifying single origin rare variants in population genomic data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genomic analyses have shown that some mutations in large population genomic datasets may be the result of repeated, independent events at the same locus. However, the possibility of this recurrent mutation is often ignored, even when it has the potential to introduce errors, such as when assuming co-ancestry through variant sharing for demographic analysis. Even rare variants such as doubletons, which should be particularly informative about recent demography, may have multiple origins despite arising relatively recently in the population. Here, we develop methods first to estimate the frequency of recurrent doubletons in a population genomic dataset from the occurrence of tri-allelic sites with two different singleton mutations, and then to identify a subset of high confidence single origin doubletons based on the presence of a linked rare variant on the surrounding shared haplotype. Applying these methods to data for the malaria mosquito Anopheles gambiae , we estimate that up to ∼16% of reported doubletons have independent origins. We then identify a set of doubletons likely (∼99%) to have a single origin, which consists of ∼68% of all the expected single origin doubletons (and ∼57% of all observed doubletons). The effectiveness of our approach is queried, and these doubletons are then used to test population genetic hypotheses about recombination, selection, and isolation by distance. The methods developed here should be useful for demographic inference when populations are large enough such that recurrent mutation cannot be ignored.

Article activity feed