Fast Phenotype Simulation for Genotype Representation Graphs
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
The Genotype Representation Graph (GRG) [DeHaas et al., 2025] is a graph representation of whole genome polymorphisms, designed to encode the variant hard-call information in phased whole genomes. It encodes the geno-types as an extremely compact graph that can be traversed efficiently, enabling dynamic programming-style algorithms on applications such as genome-wide association studies that run faster on biobank-scale data than existing alternatives. To facilitate scalable statistical genetics, we present GrgPhenoSim , an extremely fast phenotype simulator for GRGs, suitable for simulating phenotypes on biobank-scale datasets.
Results
GrgPhenoSim contains all the primary functionalities of a phenotype simulator, uses a standardized output, and supports customized simulations. Grg-PhenoSim is dozens to hundreds of times faster than tstrait [Tagami et al., 2024], a fast ancestral recombination graph-based phenotype simulator, when the sample size ranges from thousands to hundreds of thousands samples .
Availability
The GrgPhenoSim library and use-case demonstrations are available at https://github.com/aprilweilab/grg_pheno_sim
The documentation for GrgPhenoSim is hosted at https://grgl.readthedocs.io/en/latest/index.html