assessPool: a flexible pipeline for population genomic analyses of pooled sequencing data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Despite the dramatic decrease in high-throughput sequencing costs over time, sequencing the ideal number of individuals for population genetic inference remains prohibitively expensive. When research questions require only population-level resolution, pooling individual samples before sequencing (pool-seq) can substantially reduce costs while still providing allele frequencies of Single Nucleotide Polymorphisms (SNPs). However, analyzing pooled data is comparatively difficult and less standardized than individual-based analyses. Although several programs have been developed to handle pool-seq data, most require extensive formatting or programming skills to operate. Here we introduce assessPool, an open-source R and Bash pipeline for pool- seq analyses with a focus on population structure. AssessPool accepts a Variant-Call Format (VCF) file and a FASTA-formatted reference, providing a straightforward transition from commonly used pipelines such as Stacks or dDocent. AssessPool handles varying numbers of pools and utilizes PoPoolation2 to generate locus-by-locus pairwise F ST values and associated Fisher T-test values as measures of population structure. Starting with a VCF file containing all identified SNPs, assessPool facilitates several key functionalities for population genetic analyses: i) filtering SNPs based on adjustable criteria with parameter suggestions for pool-seq data, ii) organizing data structures for analysis based on input pools, iii) creating customizable run scripts for F ST calculations using PoPoolation2 and/or the {poolfstat} R package, for all pairwise comparisons, iv) calculating locus-specific F ST values using PoPoolation2 and/or {poolfstat}, v) importing F ST output into a format compatible with R, vi) producing population genomic summary statistics, and vii) generating interactive plots to visualize and explore data. A pooled dataset generated from wild populations is used here to showcase the features of the assessPool pipeline for population genomic analyses.

Article activity feed