GPU accelerated population genetics statistics using pg_gpu

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Population genetics summary statistics—diversity, divergence, linkage disequilibrium, selection scans, and dimensionality reduction—are fundamental across human, agricultural, and ecological genomics. As whole-genome sequencing datasets have grown to hundreds of thousands of individuals, the cost of computing these statistics on conventional CPU implementations has become a major bottleneck: windowed scans of a single chromosome arm can take hours to days, and computation of pairwise linkage-disequilibrium statistics useful for demographic inference scales as O ( n 2 ) in sample size, often exceeding wall-clock budgets entirely. We present pg_gpu , a Python library implementing a comprehensive catalog of population-genetics summary statistics as fused CUDA kernels on NVIDIA GPUs. pg_gpu covers eleven categories spanning diversity and neutrality tests, divergence, admixture, the site-frequency spectrum, linkage disequilibrium, haplotype-based selection scans, dimensionality reduction (PCA, randomized PCA, local PCA / lostruct ), distance distributions, relatedness, resampling, and a generalized weighted-SFS framework for custom ω estimators. On the full Ag1000G Phase 3 chromosome 3R arm (2,940 haplotypes, 10.9 million variants) pg_gpu agrees with scikit-allel and PLINK2 to machine precision while delivering a median 139× and maximum 1,096× speedup. For the multi-population LD statistics used by moments for demographic inference, pg_gpu is a drop-in replacement that yields a ~1,750-fold speedup over the native implementation. Whole chromosome arm scans, lostruct screens, and calculation of LD statistics complete on a single NVIDIA A100 in seconds to a few minutes.

Article activity feed