GraphMana: graph-native data management for population genomics projects
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Population genomics projects rely on fragmented file-based workflows that lose provenance and require full reprocessing when samples are added. Graph-Mana stores variant data in a graph database as packed genotype arrays with pre-computed population statistics, enabling incremental sample addition, provenance tracking, cohort management, and export to 17 formats. Two access paths serve different needs: a FAST PATH reading population-level arrays in O ( K ) time and a FULL PATH unpacking per-sample genotypes in O ( N ) time. On human 1000 Genomes data (3,202 samples, 70.7M variants), Graph-Mana completed a 46-operation lifecycle in 98 minutes from a single persistent database.