Reference genome choice compromises population genetic analyses

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Characterizing genetic variation in natural populations is vital to evolutionary biology, however many non-model species lack genomic resources. Here, we demonstrate that reference bias significantly affects population genomic analyses by mapping whole genome sequence data from gray foxes ( Urocyon cinereoargenteus ) to a conspecific reference and two heterospecific canid genomes (dog and Arctic fox). Mapping to the conspecific genome improved read pairing by ∼5%, detected 26–32% more SNPs, and 33–35% more singletons. Nucleotide diversity estimates increased over 30%, F ST increased from 0.189 to 0.197, and effective population size estimates were 30-60% higher with the conspecific reference. Recombination rates varied by up to 3-fold at chromosome ends with heterospecific references. Importantly, F ST outlier detection differed markedly, with heterospecific genomes identifying twice as many unique outlier windows. These findings highlight the impact of reference genome choice and the importance of conspecific genomic resources for accurate evolutionary inference.

Graphical Abstract

Highlights

  • A species-specific reference genome improves read mapping and variant detection

  • Reference bias underestimates genetic diversity and differentiation

  • Divergent reference genomes distort demographic histories and recombination landscapes

  • Unique F ST outliers are detected across references, affecting functional interpretations

Article activity feed