Genotype Imputation from Low-Coverage WGS Using Haplotype Reference Panels in Cultivated Strawberry

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background To implement high-throughput sequencing-based genotyping in a strawberry ( Fragaria × ananassa ) breeding program, we aimed to construct a haplotype reference panel and explore its utility through genotype dosage imputation of low-coverage (1×) sequencing data. Although genotyping by whole genome sequencing (WGS) provides high SNP density, its cost remains a limitation for large-scale application. Imputation from low coverage data using a reference panel offers a cost effective alternative, but this approach has not yet been optimized for allo-octoploid strawberry. Results To reduce genotyping errors that limit phasing accuracy, we combined high sequencing depth (> 15×) with variant filtering based on average allele balance (AAB), linkage disequilibrium (LD), and Mendelian error rates (MER). Statistical phasing using SHAPEIT5 resulted in a mean switch error rate of 0.9%, with 50% of the genome covered by haplotype blocks of at least 654 kb (QHN50) without phase switches. To evaluate downstream imputation, samples from three genetically distinct populations (California, Florida, and HCFF) were downsampled to 1× and imputed using reference panels of varying size and composition (via GLIMPSE2). Both panel size and genetic diversity influenced imputation accuracy, with concordance rates ranging from 0.87 to 0.97 for the smallest panel and 0.94 to 0.98 for the largest, excluding three outliers. Conclusions These findings demonstrate that constructing a large, genetically diverse haplotype reference panel improves genotype dosage imputation from low-coverage sequencing data. However, high accuracy is still achievable with limited resources, making this a cost-efficient alternative to SNP arrays when adopting WGS-based genotyping in breeding programs. The strategy is broadly applicable to other crops where dense genotyping is needed but resources are limited. In such cases, sequencing approximately 50 genetically representative samples at ≥ 25× depth is recommended for building a reference panel suitable for imputation.

Article activity feed