GraphPop: graph-native computation decouples population genomics complexity from sample count

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Matrix-based population genomics tools scale as O ( V × N ) , re-reading the full genotype matrix for every analysis. Here we present GraphPop, a graph database engine that reduces summary statistic complexity to O ( V × K ) where K is population count—independent of sample count—by computing on pre-aggregated allele-count arrays stored as graph node properties. The same architecture enables annotation-conditioned queries via edge traversal, persistent analytical records, and multi-statistic composition. Applied to rice 3K (29.6M SNPs, 3,024 accessions) and human 1000 Genomes (3,202 samples, 22 autosomes), GraphPop reveals that all 12 rice subpopulations show π NS > 1.0 , uncovers opposite consequence-level Fst regimes between species, and identifies KCNE1 as a candidate pre-Out-of-Africa sweep via convergence of five stored statistics. GraphPop achieves 146–327 × query-time speedup for pre-aggregated statistics and 63–179 × for bit-packed haplotype computation (iHS, XP-EHH, nSL), at constant ∼ 160 MB procedure working memory. This complexity reduction makes systematic, annotation-integrated population genomics practical at scale.

Article activity feed