Unsupervised Discovery of Ancestry Informative Markers and Genetic Admixture Proportions in Biobank-Scale Data Sets

Seyoon Ko
Benjamin B. Chu
Daniel Peterson
Chidera Okenwa
Jeanette C. Papp
David H. Alexander
Eric M. Sobel
Hua Zhou
Kenneth L. Lange

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Castedo's selected articles (CastedoEllerman)

Abstract

Admixture estimation plays a crucial role in ancestry inference and genomewide association studies (GWAS). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 10 ⁵ to 10 ⁶ samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank data sets. Our implementation of the method is called OpenADMIXTURE.

Version published to 10.1101/2022.10.22.513294 on bioRxiv
Oct 24, 2022

XMR: A cross-population Mendelian randomization method for causal inference using genome-wide summary statistics

This article has 5 authors:
1. Can Yang
2. Xinrui Huang
3. Zitong Chao
4. Zhiwei Wang
5. Xianghong Hu
This article has no evaluationsLatest version Mar 20, 2026
Admixture Mapping Identifies Ancestry-Associated Loci Linked to Hyperandrogenism and Insulin Resistance in HAIR-AN

This article has 6 authors:
1. Luís Jesuíno de Oliveira Andrade
2. Gabriela Correia Matos de Oliveira
3. Paulo Roberto Santana de Melo
4. Alcina Maria Vinhaes Bittencourt
5. Osmário Jorge de Mattos Salles
6. Luís Matos de Oliveira
This article has no evaluationsLatest version Feb 19, 2026
RNAScope-Ancestry: A Cross-Modality Framework for Inferring Genetic Ancestry from RNA-Seq with Application to MECA

This article has 14 authors:
1. Rashi Verma
2. Shivam Sharma
3. Harriet NA Blankson
4. Emine Guven
5. Andrea Pearson
6. Charles D. Searles
7. Peter Baltrus
8. Tene T. Lewis
9. Priscilla Pemu
10. Dean Jones
11. Arshed Ali Quyyumi
12. Herman Taylor
13. I. King Jordan
14. Robert Meller
This article has no evaluationsLatest version Mar 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

XMR: A cross-population Mendelian randomization method for causal inference using genome-wide summary statistics

Admixture Mapping Identifies Ancestry-Associated Loci Linked to Hyperandrogenism and Insulin Resistance in HAIR-AN

RNAScope-Ancestry: A Cross-Modality Framework for Inferring Genetic Ancestry from RNA-Seq with Application to MECA