Allele age estimators designed for whole genome datasets show only a modest decrease in accuracy when applied to whole exome datasets

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Personalized genomics in the healthcare system is becoming increasingly accessible as the costs of sequencing decreases. With the increase in number of genomes, larger numbers of rare variants are being discovered and much work is being done to identify their functional impacts in relation to disease phenotypes. One way to characterize these variants is to estimate the time the mutation entered the population. However, allele age estimators such as Relate, Genealogical Estimator of Variant Age, and time of coalescence, were developed based on the assumption that datasets include the entire genome. We examined the performance of each of these estimators on simulated exome data under a neutral constant population size model and found that each provides usable estimates of allele age from whole-exome datasets. To test the robustness of these methods, analyses were undertaken to simulate data under a population expansion model and background selection. Relate performs the best amongst all three estimators with Pearson coefficients of 0.64 and 0.68 (neutral constant and expansion population model) with a 17 percent and 15 percent drop in accuracy between whole genome and whole exome estimations.

Of the three estimators, Relate is best able to parallelize to yield quick results with little resources, however even Relate is only able to scale to thousands of samples making it unable to match the hundreds of thousands of samples being currently released. While more work is needed to expand the capabilities of current methods of estimating allele age, these methods estimate the age of mutations with a modest decrease in performance.

Article Summary

Increasing availability of whole exome sequencing yields large numbers of rare variants that have direct impact on disease phenotypes. Many methods of identifying the functional impact of mutations exist including the estimation of the time a mutation entered a population. Popular methods of estimating this time assume whole genome data in the estimate of the allele age based on haplotypes. We simulated genome and exome data under a constant and expansion population demography model and found that there is a decrease in accuracy in all three methods for exome data of 15-30% depending on the method. Testing the robustness of the best performing method, Relate, further simulations introducing background selection and varying the sample size were also undertaken with similar results.

Article activity feed