Calculating and interpreting F ST in the genomics era
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The relative genetic distance between populations is commonly measured using the fixation index ( F ST ). Traditionally inferred from allele frequency differences, the question arises how F ST can be estimated and interpreted when analysing genomic datasets with low sample sizes. Here, we advocate an elegant solution first put forward by Hudson et al. (1992): F ST = ( D xy – π xy )/ D xy , where D xy and π xy denote mean sequence dissimilarity between and within populations, respectively. This multi-locus F ST -metric can be derived from allele frequency data, but also from sequence alignment data alone, even when sample sizes are low and/or unequal. As with other F ST -metrices, the numerator denotes net divergence ( D a ), which is equivalent to the f 2 -statistic and Nei’s D (for realistic estimates of D xy and π xy ). In terms of demographic inference, net divergence measures the difference in increase of D xy and π xy since the population split, owing to a reduction of coalescence times within populations as a result of genetic drift. Because different combinations of ΔD xy and Δπ xy can produce identical F ST -estimates, no universal relationship exists between F ST and population split time. Still, in case of recent population splits, when novel mutations are negligible, F ST -estimates can be accurately converted into coalescent units ( τ . i.e., split time in multiples of 2 N e ). This then allows to quantify gene tree discordance, without the need for multispecies coalescent based analyses, using the formula: P discordance = ⅔·(1 – F ST ). To facilitate the use of the Hudson F ST -metric, we implemented new utilities in the R package SambaR.