Estimating hierarchical F –statistics from Pool–Seq data

Mathieu Gautier
Marta Coronado-Zamora
Renaud Vitalis

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduced over seventy years ago, F –statistics have been and remain central to population and evolutionary genetics. Among them, F _ST is one of the most commonly used descriptive statistics in empirical studies, notably to characterize the structure of genetic polymorphisms within and between populations, to shed light on the evolutionary history of populations, or to identify marker loci under differential selection for adaptive traits. However, the use of F _ST in simplified population models can overlook important hierarchical structures, such as geographic or temporal subdivisions, potentially leading to misleading interpretations and increasing false positives in genome scans for adaptive differentiation. Hierarchical F –statistics have been introduced to account for multiple predefined levels of population structure. Several estimators have also been proposed, including robust ones implemented in the popular R package hierfstat . Nevertheless, these were primarily designed for individual genotyping data and can be computationally intensive for large genomic datasets. In this study, we extend previous work by developing unbiased method-of-moments estimators for hierarchical F –statistics tailored for Pool–Seq data, a cost-effective alternative to individual genome sequencing. These Pool–Seq estimators have been developed in an anova framework, using definitions based on identity-in-state probabilities. The new estimators have been implemented in an updated version of the R package poolfstat , together with estimators for sample allele count data derived from individual genotyping data. We validate and compare the performance of these estimators through extensive simulations under a hierarchical island model. Finally, we apply these estimators to real Pool–Seq data from Drosophila melanogaster populations, demonstrating their usefulness in revealing population structure and identifying loci with high differentiation within or between groups of subpopulations and associated with spatial or temporal genetic variation.

Version published to 10.1101/2024.11.22.624688 on bioRxiv
Nov 22, 2024

PSMC-FAC: A Statistical Framework for Correcting Loss of Heterozygosity in Low-Coverage Genomic Demographic Inference

This article has 5 authors:
1. Francisco Iglesias-Santos
2. Alba Nieto
3. Sònia Casillas
4. Antonio Barbadilla
5. Carlos Sarabia
This article has no evaluationsLatest version Mar 9, 2026
Stronger Evidence for Trait–Environment Association by Pre-processing of Abundance Tables

This article has 1 author:
1. Cajo ter Braak
This article has no evaluationsLatest version Feb 25, 2026
A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types

This article has 4 authors:
1. Xin Wang
2. Guangbao Luo
3. Li Xiao
4. Zhangjun Fei
This article has no evaluationsLatest version Feb 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

PSMC-FAC: A Statistical Framework for Correcting Loss of Heterozygosity in Low-Coverage Genomic Demographic Inference

Stronger Evidence for Trait–Environment Association by Pre-processing of Abundance Tables

A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types