KANN: estimation of genetic ancestry profiles by nearest neighbor regression

Juha Riikonen
Sini Kerminen
Aki Havulinna
Matti Pirinen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

State-of-the-art methods for inferring individual-level genetic ancestry are based on statistical models for haplotype data. Unfortunately, these methods are computationally demanding, making them impracticable for biobank-scale analyses. In this paper we describe KANN, an efficient k-nearest neighbor regression method for individual-level ancestry estimation with respect to predefined source populations using only principal components of genetic structure. Contrary to the existing tools that can only use reference samples with discrete source population assignment, KANN enables the use of reference samples with continuous ancestry profiles across multiple source populations. We illustrate KANN on a data set of 18,125 Finnish samples from THL Biobank, estimating ancestry profiles across up to 10 Finnish source populations. KANN’s ancestry estimates agree well with the haplotype-based method SOURCEFIND, showing a correlation of at least 0.859 in all 10 source populations, making KANN a promising tool for ancestry estimation in large-scale genomic studies.

Version published to 10.1101/2025.08.27.671485 on bioRxiv
Aug 28, 2025

Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing

This article has 8 authors:
1. Annika Freudiger
2. Natalie Kestel
3. Vladimir Jovanovic
4. Mariana Madruga de Brito
5. Angelina Ruiz-Lambides
6. Katja Nowick
7. Anja Widdig
8. Harald Ringbauer
This article has no evaluationsLatest version Jan 23, 2026
Derivation of prediction error variance for non-genotyped individuals in genomic selection

This article has 3 authors:
1. Vinícius Junqueira
2. Marcos Jun-Iti Yokoo
3. Fernando Flores
This article has no evaluationsLatest version Dec 17, 2025
An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses

This article has 4 authors:
1. Zhihui Zhang
2. Dakai Zhu
3. Xiangjun Xiao
4. Christopher I. Amos
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing

Derivation of prediction error variance for non-genotyped individuals in genomic selection

An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses