Genotype imputation and error estimation in connected multiparental populations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multiparental populations have been produced for quantitative trait loci (QTL) mapping in many crops, where next-generation sequencing has become a cost-effective tool for genotyping. Previously, we have developed a hidden Markov framework denoted by MagicImpute_mma for genotype imputation in a multiparental population, which was implemented in Mathematica. However, its computational time increases quickly with the number of parents. In this work, we extend MagicImpute_mma into MagicImpute for increasing computational efficiency and robustness to various types of errors. Particularly, it has the following novel features: (1) allowing for multiple multiparental populations that may be connected by sharing parents, (2) allowing for many missing parents that are not available for sequencing, (3) accounting for allelic bias and overdispersion in next generation sequencing data, (4) inferring marker-specific error rates and filtering for markers with low error rates, and (5) being implemented in the high performance Julia language. Besides extensive simulation studies, we evaluate MagicImpute by three real datasets: the rice F2 population with sequence depth being low, the apple F1 population with parents being outbred, and the sorghum multi-parent advanced generation inter-cross (MAGIC) population with 10 male sterile lines (out of 29 parents) being missing. The results have shown that MagicImpute is accurate for genotype imputation in connected bi- or multi-parental populations with various types of sequence errors and it opens up new opportunities for QTL mapping after imputing many missing parents.