Optimising Genotype Imputation for Precise Genetic Association in Forensic Phenotype Prediction and Trait Studies

Zehra Koksal
Andreas Tillmar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The imputation of single nucleotide polymorphisms (SNPs) provides a low-cost alternative to augment the size of genotyped SNP panels. Genotype imputation is commonly applied to study genotype-phenotype correlations in medical and population genetics, and has a great – yet unexplored – potential in a forensic context. Forensic DNA phenotyping, i.e., the prediction of phenotypic traits based on SNPs, can greatly benefit from imputing missing DNA markers necessary for utilising available prediction models and implementing novel prediction models. Currently however, most imputation studies investigate the performance of random SNPs with limited focus on SNPs involved in phenotypic expression or association.

In the current study, individuals from the 1000 Genomes Project with high predicted trait diversity were used to explore the imputation accuracy of SNPs leveraged in phenotype prediction models and SNPs associated with facial traits compared to all SNPs. Further, the performance of the HIrisPlex-S prediction model for phenotypic traits was investigated using different imputed datasets.

Firstly, we were able to corroborate that the number and selection of SNPs in the genotype dataset and the minor allele frequency (MAF) are major drivers of imputation call and error rates. Secondly, we explored increased imputation errors for phenotypic SNPs compared to randomly selected SNPs due to MAF differences. Further, we corroborated findings on lower imputation error rates for SNPs in coding regions due to increased linkage compared to non-coding regions. When investigating the impact of imputation on the performance of trait prediction using the HIrisPlex-S prediction model, we observed that datasets with more genotyped SNPs and phenotypes with more observations in the reference panel improved the prediction of these phenotypes. Finally, we showed novel insights into the improved trait prediction when applying more lenient calling thresholds for SNP imputation due to the detrimental impact of missing genotypes on trait prediction accuracy compared to imputation errors.

Our findings, which show different imputation performances for general compared to phenotype-associated and prediction-model SNPs, highlight the importance of investigating imputation performances for the SNPs of interest. Further, we reported optimal trait predictions using lenient calling threshold of imputed SNP genotypes paired with a SNP panel with high linkage, which shows the high applicability of SNP imputation for phenotypic trait predictions. We recommend imputation tests for the prediction models of interest due to the differences between prediction models.

Highlights

Number and selection of SNPs in imputation input and MAF impact call and error rate
Lower imputation accuracy of phenotype-associated SNPs compared to random SNPs
Lower imputation error rates for SNP in coding over non-coding regions
High abundance of phenotype in reference panel favours its prediction
Most accurate trait predictions for lenient SNP calling thresholds for imputation

Version published to 10.1101/2025.08.01.668059 on bioRxiv
Aug 4, 2025

Derivation of prediction error variance for non-genotyped individuals in genomic selection

This article has 3 authors:
1. Vinícius Junqueira
2. Marcos Jun-Iti Yokoo
3. Fernando Flores
This article has no evaluationsLatest version Dec 17, 2025
Bayesian fine-mapping pinpoints candidate genes and pleiotropic loci of production traits from a chicken backcrossing scheme

This article has 8 authors:
1. Chi Mei Sun
2. Johannes Geibel
3. Henner Simianer
4. Björn Andersson
5. David Cavero
6. Rudolf Preisinger
7. Steffen Weigend
8. Christian Reimer
This article has no evaluationsLatest version Jan 13, 2026
An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses

This article has 4 authors:
1. Zhihui Zhang
2. Dakai Zhu
3. Xiangjun Xiao
4. Christopher I. Amos
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

Highlights

Article activity feed

Related articles

Derivation of prediction error variance for non-genotyped individuals in genomic selection

Bayesian fine-mapping pinpoints candidate genes and pleiotropic loci of production traits from a chicken backcrossing scheme

An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses