The Illusion of Polygenicity in Pool-seq Genetic Mapping studies: Insufficient Power Can Mask Simple Genetic Architectures
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Pool-seq (pooled sequencing) combines DNA from multiple individuals prior to sequencing, enabling population-level allele frequency estimation without individual genotyping. When employed in Genome Wide Association Studies (GWAS) pool-seq faces a fundamental power limitation in that errors on allele frequency estimates are proportional to sequence coverage. Although this power limitation is widely appreciated, pool-seq GWAS lacking unambiguous hits are often interpreted as showing a highly polygenic genetic architecture. We illustrate the limitation of inferring architecture from Manhattan plots using empirical data from a Drosophila zinc resistance mapping study. Despite achieving an average of >700× sequencing coverage in case and control pools, a directly ascertained SNP-based GWAS failed to reveal clear evidence for major-effect loci. A unique feature of the dataset is that an advanced intercross multiparent population, with known founders, was employed as the base population for the GWAS. We leverage this unique population structure to carry out a second GWAS using imputed haplotype frequency estimates, which in contrast revealed localized regions of major effect. A third reanalysis of the same data using imputed SNP genotypes derived from the founder haplotype frequency estimates uncovered a similar major gene architecture. The key difference between approaches lies in statistical power: directly ascertained SNP counts have errors proportional to sequencing coverage whereas known founder imputation-based approaches can be considerably more accurate. This work highlights that insufficiently powered GWAS studies can mask simple genetic architectures and create the illusion of polygenicity through statistical noise alone.