Faster model-based estimation of ancestry proportions

Cindy G. Santander
Alba Refoyo Martinez
Jonas Meisner

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Peer Community in Evolutionary Biology)

Abstract

Ancestry estimation from genotype data in unrelated individuals has become an essential tool in population and medical genetics to understand demographic population histories and to model or correct for population structure. The ADMIXTURE software is a widely used model-based approach to account for population stratification, however, it struggles with convergence issues and does not scale to modern human datasets or the large number of variants in whole-genome sequencing data. Likelihood-free approaches optimize a least square objective and have gained popularity in recent years due to their scalability. However, this comes at the cost of accuracy in the ancestry estimates in more complex admixture scenarios. We present a new model-based approach, fastmixture , which adopts aspects from likelihood-free approaches for parameter initialization, followed by a mini-batch expectation-maximization procedure to model the standard likelihood. In a simulation study, we demonstrate that the model-based approaches of fastmixture and ADMIXTURE are significantly more accurate than recent and likelihood-free approaches. We further show that fastmixture runs approximately 30 × faster than ADMIXTURE on both simulated and empirical data from the 1000 Genomes Project such that our model-based approach scales to much larger sample sizes than previously possible.

Version published to 10.24072/pcjournal.503
Dec 12, 2024
Peer Community in Evolutionary Biology
Nov 18, 2024

Read the original source
Peer Community in Evolutionary Biology
Nov 18, 2024

Read the original source
Version published to 10.1101/2024.07.08.602454 on bioRxiv
Jul 11, 2024

Approximating prediction error variances and reliabilities in multiple-trait genomic prediction model using Monte Carlo sampling

This article has 5 authors:
1. Antero Heikkilä
2. Ismo Strandèn
3. Martin Lidauer
4. Klaus Nordhausen
5. Sara Taskinen
This article has no evaluationsLatest version Dec 15, 2025
Impact of scale parameter for marker variance prior in some Bayesian whole-genome regression methods

This article has 2 authors:
1. Özge KOZAKLI
2. Ayhan CEYHAN
This article has no evaluationsLatest version Jan 20, 2026
Derivation of prediction error variance for non-genotyped individuals in genomic selection

This article has 3 authors:
1. Vinícius Junqueira
2. Marcos Jun-Iti Yokoo
3. Fernando Flores
This article has no evaluationsLatest version Dec 17, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Approximating prediction error variances and reliabilities in multiple-trait genomic prediction model using Monte Carlo sampling

Impact of scale parameter for marker variance prior in some Bayesian whole-genome regression methods

Derivation of prediction error variance for non-genotyped individuals in genomic selection