Allele age estimators designed for whole-genome datasets show only a moderate reduction in performance when applied to whole-exome datasets

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

As personalized genomics becomes more affordable, larger numbers of rare variants are being discovered, leading to important initiatives in identifying the functional impacts in relation to disease phenotypes. One way to characterize these variants is to estimate the time the mutation entered the population. However, allele age estimators such as those implemented in the programs Relate, Genealogical Estimator of Variant Age, and Runtc were developed based on the assumption that datasets include the entire genome. We examined the performance of each of these estimators on simulated exome data under a neutral constant population size model, as well as under population expansion and background selection models. We found that each provides usable estimates of allele age from whole-exome datasets. Relate performs the best amongst all 3 estimators with Pearson coefficients of 0.83 and 0.73 (with respect to true simulated values for neutral constant and expansion population models, respectively) with a 12% and 20% decrease in correlation between whole-genome and whole-exome estimations. Of the 3 estimators, Relate is best able to parallelize to yield quick results with little resources; however, Relate is currently only able to scale to thousands of samples making it unable to match the hundreds of thousands of samples being currently released. While more work is needed to expand the capabilities of current methods of estimating allele age, these methods show a modest decrease in performance in the estimation of the age of mutations.

Article activity feed