GraphPop: graph-native computation decouples population genomics complexity from sample count

Ehsan Estaji
Shi-Wei Zhao
Zhao-Yang Chen
Shuai Nie
Jian-Feng Mao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Matrix-based population genomics tools scale as O ( V × N ) , re-reading the full genotype matrix for every analysis. Here we present GraphPop, a graph database engine that reduces summary statistic complexity to O ( V × K ) where K is population count—independent of sample count—by computing on pre-aggregated allele-count arrays stored as graph node properties. The same architecture enables annotation-conditioned queries via edge traversal, persistent analytical records, and multi-statistic composition. Applied to rice 3K (29.6M SNPs, 3,024 accessions) and human 1000 Genomes (3,202 samples, 22 autosomes), GraphPop reveals that all 12 rice subpopulations show π _N /π _S > 1.0 , uncovers opposite consequence-level Fst regimes between species, and identifies KCNE1 as a candidate pre-Out-of-Africa sweep via convergence of five stored statistics. GraphPop achieves 146–327 × query-time speedup for pre-aggregated statistics and 63–179 × for bit-packed haplotype computation (iHS, XP-EHH, nSL), at constant ∼ 160 MB procedure working memory. This complexity reduction makes systematic, annotation-integrated population genomics practical at scale.

Version published to 10.64898/2026.04.11.717929 on bioRxiv
Apr 14, 2026

General, orders-of-magnitude faster whole-genome analysis with genotype representation graphs

This article has 4 authors:
1. Drew DeHaas
2. Chris Adonizio
3. Ziqing Pan
4. Xinzhu Wei
This article has no evaluationsLatest version Apr 11, 2026
GraphHDBSCAN*: Graph-based Hierarchical Clustering on High Dimensional Single-cell RNA Sequencing Data

This article has 6 authors:
1. Seyed Ardalan Ghoreishi
2. Aleksandra Weronika Szmigiel
3. James Nagai
4. Ivan G. Costa
5. Arthur Zimek
6. Ricardo J. G. B. Campello
This article has no evaluationsLatest version Mar 26, 2026
GraphMana: graph-native data management for population genomics projects

This article has 5 authors:
1. Ehsan Estaji
2. Shi-Wei Zhao
3. Zhao-Yang Chen
4. Shuai Nie
5. Jian-Feng Mao
This article has no evaluationsLatest version Apr 14, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

General, orders-of-magnitude faster whole-genome analysis with genotype representation graphs

GraphHDBSCAN*: Graph-based Hierarchical Clustering on High Dimensional Single-cell RNA Sequencing Data

GraphMana: graph-native data management for population genomics projects