GPU accelerated population genetics statistics using pg_gpu

Nathaniel S. Pope
Angel G. Rivera-Colón
Ananya Kapoor
Kevin Korfmann
Murillo F. Rodrigues
Scott T. Small
Anastasia A. Teterina
Andrew D. Kern

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Population genetics summary statistics—diversity, divergence, linkage disequilibrium, selection scans, and dimensionality reduction—are fundamental across human, agricultural, and ecological genomics. As whole-genome sequencing datasets have grown to hundreds of thousands of individuals, the cost of computing these statistics on conventional CPU implementations has become a major bottleneck: windowed scans of a single chromosome arm can take hours to days, and computation of pairwise linkage-disequilibrium statistics useful for demographic inference scales as O ( n ² ) in sample size, often exceeding wall-clock budgets entirely. We present pg_gpu , a Python library implementing a comprehensive catalog of population-genetics summary statistics as fused CUDA kernels on NVIDIA GPUs. pg_gpu covers eleven categories spanning diversity and neutrality tests, divergence, admixture, the site-frequency spectrum, linkage disequilibrium, haplotype-based selection scans, dimensionality reduction (PCA, randomized PCA, local PCA / lostruct ), distance distributions, relatedness, resampling, and a generalized weighted-SFS framework for custom ω estimators. On the full Ag1000G Phase 3 chromosome 3R arm (2,940 haplotypes, 10.9 million variants) pg_gpu agrees with scikit-allel and PLINK2 to machine precision while delivering a median 139× and maximum 1,096× speedup. For the multi-population LD statistics used by moments for demographic inference, pg_gpu is a drop-in replacement that yields a ~1,750-fold speedup over the native implementation. Whole chromosome arm scans, lostruct screens, and calculation of LD statistics complete on a single NVIDIA A100 in seconds to a few minutes.

Version published to 10.64898/2026.05.29.728868 on bioRxiv
May 31, 2026

Fast pairwise coalescence enables gene-resolution scans for recent selection in diverse human populations

This article has 2 authors:
1. Kevin Korfmann
2. Sara Mathieson
This article has no evaluationsLatest version May 22, 2026
cuBayes: GPU accelerated FreeBayes that achieves 1-minute whole-genome SNV calling while maintaining algorithmic semantics

This article has 3 authors:
1. Anders Pitman
2. Cathy Yang
3. Yi Qiao
This article has no evaluationsLatest version Jun 16, 2026
TorchLIMIX: GPU-accelerated multivariate genome-wide association studies

This article has 3 authors:
1. Bibiana M. Horn
2. Zoran Nikoloski
3. Christoph Lippert
This article has no evaluationsLatest version May 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Fast pairwise coalescence enables gene-resolution scans for recent selection in diverse human populations

cuBayes: GPU accelerated FreeBayes that achieves 1-minute whole-genome SNV calling while maintaining algorithmic semantics

TorchLIMIX: GPU-accelerated multivariate genome-wide association studies