Revealing the range of maximum likelihood estimates in the admixture model

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Many ancestry inference tools, including STRUCTURE and ADMIXTURE, rely on the admixture model to infer both, allele frequencies p and individual admixture proportions q for a collection of individuals relative to a set of hypothetical ancestral populations. We show that under realistic conditions the likelihood in the admixture model is typically flat in some direction around a maximum likelihood estimate (MLE) . In particular, the maximum likelihood estimator is non-unique and there is a complete spectrum of possible estimates. Common inference tools typically identify only a few points within this spectrum.

We provide an algorithm which computes the set of equally likely , when starting from . It is analytic for K = 2 ancestral populations and numeric for K > 2. We apply our algorithm to data from the 1000 genomes project, and show that inter-European estimators of q can come with a large set of equally likely possibilities. In general, markers with large allele frequency differences between populations in combination with individuals with concentrated admixture proportions lead to small areas with a flat likelihood.

Our findings imply that care must be taken when interpreting results from STRUCTURE and ADMIXTURE if populations are not separated well enough.

Article activity feed